1*635a8641SAndroid Build Coastguard Worker# ----------------------------------------------------------------------------- 2*635a8641SAndroid Build Coastguard Worker# ply: lex.py 3*635a8641SAndroid Build Coastguard Worker# 4*635a8641SAndroid Build Coastguard Worker# Copyright (C) 2001-2011, 5*635a8641SAndroid Build Coastguard Worker# David M. Beazley (Dabeaz LLC) 6*635a8641SAndroid Build Coastguard Worker# All rights reserved. 7*635a8641SAndroid Build Coastguard Worker# 8*635a8641SAndroid Build Coastguard Worker# Redistribution and use in source and binary forms, with or without 9*635a8641SAndroid Build Coastguard Worker# modification, are permitted provided that the following conditions are 10*635a8641SAndroid Build Coastguard Worker# met: 11*635a8641SAndroid Build Coastguard Worker# 12*635a8641SAndroid Build Coastguard Worker# * Redistributions of source code must retain the above copyright notice, 13*635a8641SAndroid Build Coastguard Worker# this list of conditions and the following disclaimer. 14*635a8641SAndroid Build Coastguard Worker# * Redistributions in binary form must reproduce the above copyright notice, 15*635a8641SAndroid Build Coastguard Worker# this list of conditions and the following disclaimer in the documentation 16*635a8641SAndroid Build Coastguard Worker# and/or other materials provided with the distribution. 17*635a8641SAndroid Build Coastguard Worker# * Neither the name of the David Beazley or Dabeaz LLC may be used to 18*635a8641SAndroid Build Coastguard Worker# endorse or promote products derived from this software without 19*635a8641SAndroid Build Coastguard Worker# specific prior written permission. 20*635a8641SAndroid Build Coastguard Worker# 21*635a8641SAndroid Build Coastguard Worker# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 22*635a8641SAndroid Build Coastguard Worker# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 23*635a8641SAndroid Build Coastguard Worker# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 24*635a8641SAndroid Build Coastguard Worker# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 25*635a8641SAndroid Build Coastguard Worker# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 26*635a8641SAndroid Build Coastguard Worker# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 27*635a8641SAndroid Build Coastguard Worker# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 28*635a8641SAndroid Build Coastguard Worker# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 29*635a8641SAndroid Build Coastguard Worker# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 30*635a8641SAndroid Build Coastguard Worker# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 31*635a8641SAndroid Build Coastguard Worker# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 32*635a8641SAndroid Build Coastguard Worker# ----------------------------------------------------------------------------- 33*635a8641SAndroid Build Coastguard Worker 34*635a8641SAndroid Build Coastguard Worker__version__ = "3.4" 35*635a8641SAndroid Build Coastguard Worker__tabversion__ = "3.2" # Version of table file used 36*635a8641SAndroid Build Coastguard Worker 37*635a8641SAndroid Build Coastguard Workerimport re, sys, types, copy, os 38*635a8641SAndroid Build Coastguard Worker 39*635a8641SAndroid Build Coastguard Worker# This tuple contains known string types 40*635a8641SAndroid Build Coastguard Workertry: 41*635a8641SAndroid Build Coastguard Worker # Python 2.6 42*635a8641SAndroid Build Coastguard Worker StringTypes = (types.StringType, types.UnicodeType) 43*635a8641SAndroid Build Coastguard Workerexcept AttributeError: 44*635a8641SAndroid Build Coastguard Worker # Python 3.0 45*635a8641SAndroid Build Coastguard Worker StringTypes = (str, bytes) 46*635a8641SAndroid Build Coastguard Worker 47*635a8641SAndroid Build Coastguard Worker# Extract the code attribute of a function. Different implementations 48*635a8641SAndroid Build Coastguard Worker# are for Python 2/3 compatibility. 49*635a8641SAndroid Build Coastguard Worker 50*635a8641SAndroid Build Coastguard Workerif sys.version_info[0] < 3: 51*635a8641SAndroid Build Coastguard Worker def func_code(f): 52*635a8641SAndroid Build Coastguard Worker return f.func_code 53*635a8641SAndroid Build Coastguard Workerelse: 54*635a8641SAndroid Build Coastguard Worker def func_code(f): 55*635a8641SAndroid Build Coastguard Worker return f.__code__ 56*635a8641SAndroid Build Coastguard Worker 57*635a8641SAndroid Build Coastguard Worker# This regular expression is used to match valid token names 58*635a8641SAndroid Build Coastguard Worker_is_identifier = re.compile(r'^[a-zA-Z0-9_]+$') 59*635a8641SAndroid Build Coastguard Worker 60*635a8641SAndroid Build Coastguard Worker# Exception thrown when invalid token encountered and no default error 61*635a8641SAndroid Build Coastguard Worker# handler is defined. 62*635a8641SAndroid Build Coastguard Worker 63*635a8641SAndroid Build Coastguard Workerclass LexError(Exception): 64*635a8641SAndroid Build Coastguard Worker def __init__(self,message,s): 65*635a8641SAndroid Build Coastguard Worker self.args = (message,) 66*635a8641SAndroid Build Coastguard Worker self.text = s 67*635a8641SAndroid Build Coastguard Worker 68*635a8641SAndroid Build Coastguard Worker# Token class. This class is used to represent the tokens produced. 69*635a8641SAndroid Build Coastguard Workerclass LexToken(object): 70*635a8641SAndroid Build Coastguard Worker def __str__(self): 71*635a8641SAndroid Build Coastguard Worker return "LexToken(%s,%r,%d,%d)" % (self.type,self.value,self.lineno,self.lexpos) 72*635a8641SAndroid Build Coastguard Worker def __repr__(self): 73*635a8641SAndroid Build Coastguard Worker return str(self) 74*635a8641SAndroid Build Coastguard Worker 75*635a8641SAndroid Build Coastguard Worker# This object is a stand-in for a logging object created by the 76*635a8641SAndroid Build Coastguard Worker# logging module. 77*635a8641SAndroid Build Coastguard Worker 78*635a8641SAndroid Build Coastguard Workerclass PlyLogger(object): 79*635a8641SAndroid Build Coastguard Worker def __init__(self,f): 80*635a8641SAndroid Build Coastguard Worker self.f = f 81*635a8641SAndroid Build Coastguard Worker def critical(self,msg,*args,**kwargs): 82*635a8641SAndroid Build Coastguard Worker self.f.write((msg % args) + "\n") 83*635a8641SAndroid Build Coastguard Worker 84*635a8641SAndroid Build Coastguard Worker def warning(self,msg,*args,**kwargs): 85*635a8641SAndroid Build Coastguard Worker self.f.write("WARNING: "+ (msg % args) + "\n") 86*635a8641SAndroid Build Coastguard Worker 87*635a8641SAndroid Build Coastguard Worker def error(self,msg,*args,**kwargs): 88*635a8641SAndroid Build Coastguard Worker self.f.write("ERROR: " + (msg % args) + "\n") 89*635a8641SAndroid Build Coastguard Worker 90*635a8641SAndroid Build Coastguard Worker info = critical 91*635a8641SAndroid Build Coastguard Worker debug = critical 92*635a8641SAndroid Build Coastguard Worker 93*635a8641SAndroid Build Coastguard Worker# Null logger is used when no output is generated. Does nothing. 94*635a8641SAndroid Build Coastguard Workerclass NullLogger(object): 95*635a8641SAndroid Build Coastguard Worker def __getattribute__(self,name): 96*635a8641SAndroid Build Coastguard Worker return self 97*635a8641SAndroid Build Coastguard Worker def __call__(self,*args,**kwargs): 98*635a8641SAndroid Build Coastguard Worker return self 99*635a8641SAndroid Build Coastguard Worker 100*635a8641SAndroid Build Coastguard Worker# ----------------------------------------------------------------------------- 101*635a8641SAndroid Build Coastguard Worker# === Lexing Engine === 102*635a8641SAndroid Build Coastguard Worker# 103*635a8641SAndroid Build Coastguard Worker# The following Lexer class implements the lexer runtime. There are only 104*635a8641SAndroid Build Coastguard Worker# a few public methods and attributes: 105*635a8641SAndroid Build Coastguard Worker# 106*635a8641SAndroid Build Coastguard Worker# input() - Store a new string in the lexer 107*635a8641SAndroid Build Coastguard Worker# token() - Get the next token 108*635a8641SAndroid Build Coastguard Worker# clone() - Clone the lexer 109*635a8641SAndroid Build Coastguard Worker# 110*635a8641SAndroid Build Coastguard Worker# lineno - Current line number 111*635a8641SAndroid Build Coastguard Worker# lexpos - Current position in the input string 112*635a8641SAndroid Build Coastguard Worker# ----------------------------------------------------------------------------- 113*635a8641SAndroid Build Coastguard Worker 114*635a8641SAndroid Build Coastguard Workerclass Lexer: 115*635a8641SAndroid Build Coastguard Worker def __init__(self): 116*635a8641SAndroid Build Coastguard Worker self.lexre = None # Master regular expression. This is a list of 117*635a8641SAndroid Build Coastguard Worker # tuples (re,findex) where re is a compiled 118*635a8641SAndroid Build Coastguard Worker # regular expression and findex is a list 119*635a8641SAndroid Build Coastguard Worker # mapping regex group numbers to rules 120*635a8641SAndroid Build Coastguard Worker self.lexretext = None # Current regular expression strings 121*635a8641SAndroid Build Coastguard Worker self.lexstatere = {} # Dictionary mapping lexer states to master regexs 122*635a8641SAndroid Build Coastguard Worker self.lexstateretext = {} # Dictionary mapping lexer states to regex strings 123*635a8641SAndroid Build Coastguard Worker self.lexstaterenames = {} # Dictionary mapping lexer states to symbol names 124*635a8641SAndroid Build Coastguard Worker self.lexstate = "INITIAL" # Current lexer state 125*635a8641SAndroid Build Coastguard Worker self.lexstatestack = [] # Stack of lexer states 126*635a8641SAndroid Build Coastguard Worker self.lexstateinfo = None # State information 127*635a8641SAndroid Build Coastguard Worker self.lexstateignore = {} # Dictionary of ignored characters for each state 128*635a8641SAndroid Build Coastguard Worker self.lexstateerrorf = {} # Dictionary of error functions for each state 129*635a8641SAndroid Build Coastguard Worker self.lexreflags = 0 # Optional re compile flags 130*635a8641SAndroid Build Coastguard Worker self.lexdata = None # Actual input data (as a string) 131*635a8641SAndroid Build Coastguard Worker self.lexpos = 0 # Current position in input text 132*635a8641SAndroid Build Coastguard Worker self.lexlen = 0 # Length of the input text 133*635a8641SAndroid Build Coastguard Worker self.lexerrorf = None # Error rule (if any) 134*635a8641SAndroid Build Coastguard Worker self.lextokens = None # List of valid tokens 135*635a8641SAndroid Build Coastguard Worker self.lexignore = "" # Ignored characters 136*635a8641SAndroid Build Coastguard Worker self.lexliterals = "" # Literal characters that can be passed through 137*635a8641SAndroid Build Coastguard Worker self.lexmodule = None # Module 138*635a8641SAndroid Build Coastguard Worker self.lineno = 1 # Current line number 139*635a8641SAndroid Build Coastguard Worker self.lexoptimize = 0 # Optimized mode 140*635a8641SAndroid Build Coastguard Worker 141*635a8641SAndroid Build Coastguard Worker def clone(self,object=None): 142*635a8641SAndroid Build Coastguard Worker c = copy.copy(self) 143*635a8641SAndroid Build Coastguard Worker 144*635a8641SAndroid Build Coastguard Worker # If the object parameter has been supplied, it means we are attaching the 145*635a8641SAndroid Build Coastguard Worker # lexer to a new object. In this case, we have to rebind all methods in 146*635a8641SAndroid Build Coastguard Worker # the lexstatere and lexstateerrorf tables. 147*635a8641SAndroid Build Coastguard Worker 148*635a8641SAndroid Build Coastguard Worker if object: 149*635a8641SAndroid Build Coastguard Worker newtab = { } 150*635a8641SAndroid Build Coastguard Worker for key, ritem in self.lexstatere.items(): 151*635a8641SAndroid Build Coastguard Worker newre = [] 152*635a8641SAndroid Build Coastguard Worker for cre, findex in ritem: 153*635a8641SAndroid Build Coastguard Worker newfindex = [] 154*635a8641SAndroid Build Coastguard Worker for f in findex: 155*635a8641SAndroid Build Coastguard Worker if not f or not f[0]: 156*635a8641SAndroid Build Coastguard Worker newfindex.append(f) 157*635a8641SAndroid Build Coastguard Worker continue 158*635a8641SAndroid Build Coastguard Worker newfindex.append((getattr(object,f[0].__name__),f[1])) 159*635a8641SAndroid Build Coastguard Worker newre.append((cre,newfindex)) 160*635a8641SAndroid Build Coastguard Worker newtab[key] = newre 161*635a8641SAndroid Build Coastguard Worker c.lexstatere = newtab 162*635a8641SAndroid Build Coastguard Worker c.lexstateerrorf = { } 163*635a8641SAndroid Build Coastguard Worker for key, ef in self.lexstateerrorf.items(): 164*635a8641SAndroid Build Coastguard Worker c.lexstateerrorf[key] = getattr(object,ef.__name__) 165*635a8641SAndroid Build Coastguard Worker c.lexmodule = object 166*635a8641SAndroid Build Coastguard Worker return c 167*635a8641SAndroid Build Coastguard Worker 168*635a8641SAndroid Build Coastguard Worker # ------------------------------------------------------------ 169*635a8641SAndroid Build Coastguard Worker # writetab() - Write lexer information to a table file 170*635a8641SAndroid Build Coastguard Worker # ------------------------------------------------------------ 171*635a8641SAndroid Build Coastguard Worker def writetab(self,tabfile,outputdir=""): 172*635a8641SAndroid Build Coastguard Worker if isinstance(tabfile,types.ModuleType): 173*635a8641SAndroid Build Coastguard Worker return 174*635a8641SAndroid Build Coastguard Worker basetabfilename = tabfile.split(".")[-1] 175*635a8641SAndroid Build Coastguard Worker filename = os.path.join(outputdir,basetabfilename)+".py" 176*635a8641SAndroid Build Coastguard Worker tf = open(filename,"w") 177*635a8641SAndroid Build Coastguard Worker tf.write("# %s.py. This file automatically created by PLY (version %s). Don't edit!\n" % (tabfile,__version__)) 178*635a8641SAndroid Build Coastguard Worker tf.write("_tabversion = %s\n" % repr(__version__)) 179*635a8641SAndroid Build Coastguard Worker tf.write("_lextokens = %s\n" % repr(self.lextokens)) 180*635a8641SAndroid Build Coastguard Worker tf.write("_lexreflags = %s\n" % repr(self.lexreflags)) 181*635a8641SAndroid Build Coastguard Worker tf.write("_lexliterals = %s\n" % repr(self.lexliterals)) 182*635a8641SAndroid Build Coastguard Worker tf.write("_lexstateinfo = %s\n" % repr(self.lexstateinfo)) 183*635a8641SAndroid Build Coastguard Worker 184*635a8641SAndroid Build Coastguard Worker tabre = { } 185*635a8641SAndroid Build Coastguard Worker # Collect all functions in the initial state 186*635a8641SAndroid Build Coastguard Worker initial = self.lexstatere["INITIAL"] 187*635a8641SAndroid Build Coastguard Worker initialfuncs = [] 188*635a8641SAndroid Build Coastguard Worker for part in initial: 189*635a8641SAndroid Build Coastguard Worker for f in part[1]: 190*635a8641SAndroid Build Coastguard Worker if f and f[0]: 191*635a8641SAndroid Build Coastguard Worker initialfuncs.append(f) 192*635a8641SAndroid Build Coastguard Worker 193*635a8641SAndroid Build Coastguard Worker for key, lre in self.lexstatere.items(): 194*635a8641SAndroid Build Coastguard Worker titem = [] 195*635a8641SAndroid Build Coastguard Worker for i in range(len(lre)): 196*635a8641SAndroid Build Coastguard Worker titem.append((self.lexstateretext[key][i],_funcs_to_names(lre[i][1],self.lexstaterenames[key][i]))) 197*635a8641SAndroid Build Coastguard Worker tabre[key] = titem 198*635a8641SAndroid Build Coastguard Worker 199*635a8641SAndroid Build Coastguard Worker tf.write("_lexstatere = %s\n" % repr(tabre)) 200*635a8641SAndroid Build Coastguard Worker tf.write("_lexstateignore = %s\n" % repr(self.lexstateignore)) 201*635a8641SAndroid Build Coastguard Worker 202*635a8641SAndroid Build Coastguard Worker taberr = { } 203*635a8641SAndroid Build Coastguard Worker for key, ef in self.lexstateerrorf.items(): 204*635a8641SAndroid Build Coastguard Worker if ef: 205*635a8641SAndroid Build Coastguard Worker taberr[key] = ef.__name__ 206*635a8641SAndroid Build Coastguard Worker else: 207*635a8641SAndroid Build Coastguard Worker taberr[key] = None 208*635a8641SAndroid Build Coastguard Worker tf.write("_lexstateerrorf = %s\n" % repr(taberr)) 209*635a8641SAndroid Build Coastguard Worker tf.close() 210*635a8641SAndroid Build Coastguard Worker 211*635a8641SAndroid Build Coastguard Worker # ------------------------------------------------------------ 212*635a8641SAndroid Build Coastguard Worker # readtab() - Read lexer information from a tab file 213*635a8641SAndroid Build Coastguard Worker # ------------------------------------------------------------ 214*635a8641SAndroid Build Coastguard Worker def readtab(self,tabfile,fdict): 215*635a8641SAndroid Build Coastguard Worker if isinstance(tabfile,types.ModuleType): 216*635a8641SAndroid Build Coastguard Worker lextab = tabfile 217*635a8641SAndroid Build Coastguard Worker else: 218*635a8641SAndroid Build Coastguard Worker if sys.version_info[0] < 3: 219*635a8641SAndroid Build Coastguard Worker exec("import %s as lextab" % tabfile) 220*635a8641SAndroid Build Coastguard Worker else: 221*635a8641SAndroid Build Coastguard Worker env = { } 222*635a8641SAndroid Build Coastguard Worker exec("import %s as lextab" % tabfile, env,env) 223*635a8641SAndroid Build Coastguard Worker lextab = env['lextab'] 224*635a8641SAndroid Build Coastguard Worker 225*635a8641SAndroid Build Coastguard Worker if getattr(lextab,"_tabversion","0.0") != __version__: 226*635a8641SAndroid Build Coastguard Worker raise ImportError("Inconsistent PLY version") 227*635a8641SAndroid Build Coastguard Worker 228*635a8641SAndroid Build Coastguard Worker self.lextokens = lextab._lextokens 229*635a8641SAndroid Build Coastguard Worker self.lexreflags = lextab._lexreflags 230*635a8641SAndroid Build Coastguard Worker self.lexliterals = lextab._lexliterals 231*635a8641SAndroid Build Coastguard Worker self.lexstateinfo = lextab._lexstateinfo 232*635a8641SAndroid Build Coastguard Worker self.lexstateignore = lextab._lexstateignore 233*635a8641SAndroid Build Coastguard Worker self.lexstatere = { } 234*635a8641SAndroid Build Coastguard Worker self.lexstateretext = { } 235*635a8641SAndroid Build Coastguard Worker for key,lre in lextab._lexstatere.items(): 236*635a8641SAndroid Build Coastguard Worker titem = [] 237*635a8641SAndroid Build Coastguard Worker txtitem = [] 238*635a8641SAndroid Build Coastguard Worker for i in range(len(lre)): 239*635a8641SAndroid Build Coastguard Worker titem.append((re.compile(lre[i][0],lextab._lexreflags | re.VERBOSE),_names_to_funcs(lre[i][1],fdict))) 240*635a8641SAndroid Build Coastguard Worker txtitem.append(lre[i][0]) 241*635a8641SAndroid Build Coastguard Worker self.lexstatere[key] = titem 242*635a8641SAndroid Build Coastguard Worker self.lexstateretext[key] = txtitem 243*635a8641SAndroid Build Coastguard Worker self.lexstateerrorf = { } 244*635a8641SAndroid Build Coastguard Worker for key,ef in lextab._lexstateerrorf.items(): 245*635a8641SAndroid Build Coastguard Worker self.lexstateerrorf[key] = fdict[ef] 246*635a8641SAndroid Build Coastguard Worker self.begin('INITIAL') 247*635a8641SAndroid Build Coastguard Worker 248*635a8641SAndroid Build Coastguard Worker # ------------------------------------------------------------ 249*635a8641SAndroid Build Coastguard Worker # input() - Push a new string into the lexer 250*635a8641SAndroid Build Coastguard Worker # ------------------------------------------------------------ 251*635a8641SAndroid Build Coastguard Worker def input(self,s): 252*635a8641SAndroid Build Coastguard Worker # Pull off the first character to see if s looks like a string 253*635a8641SAndroid Build Coastguard Worker c = s[:1] 254*635a8641SAndroid Build Coastguard Worker if not isinstance(c,StringTypes): 255*635a8641SAndroid Build Coastguard Worker raise ValueError("Expected a string") 256*635a8641SAndroid Build Coastguard Worker self.lexdata = s 257*635a8641SAndroid Build Coastguard Worker self.lexpos = 0 258*635a8641SAndroid Build Coastguard Worker self.lexlen = len(s) 259*635a8641SAndroid Build Coastguard Worker 260*635a8641SAndroid Build Coastguard Worker # ------------------------------------------------------------ 261*635a8641SAndroid Build Coastguard Worker # begin() - Changes the lexing state 262*635a8641SAndroid Build Coastguard Worker # ------------------------------------------------------------ 263*635a8641SAndroid Build Coastguard Worker def begin(self,state): 264*635a8641SAndroid Build Coastguard Worker if not state in self.lexstatere: 265*635a8641SAndroid Build Coastguard Worker raise ValueError("Undefined state") 266*635a8641SAndroid Build Coastguard Worker self.lexre = self.lexstatere[state] 267*635a8641SAndroid Build Coastguard Worker self.lexretext = self.lexstateretext[state] 268*635a8641SAndroid Build Coastguard Worker self.lexignore = self.lexstateignore.get(state,"") 269*635a8641SAndroid Build Coastguard Worker self.lexerrorf = self.lexstateerrorf.get(state,None) 270*635a8641SAndroid Build Coastguard Worker self.lexstate = state 271*635a8641SAndroid Build Coastguard Worker 272*635a8641SAndroid Build Coastguard Worker # ------------------------------------------------------------ 273*635a8641SAndroid Build Coastguard Worker # push_state() - Changes the lexing state and saves old on stack 274*635a8641SAndroid Build Coastguard Worker # ------------------------------------------------------------ 275*635a8641SAndroid Build Coastguard Worker def push_state(self,state): 276*635a8641SAndroid Build Coastguard Worker self.lexstatestack.append(self.lexstate) 277*635a8641SAndroid Build Coastguard Worker self.begin(state) 278*635a8641SAndroid Build Coastguard Worker 279*635a8641SAndroid Build Coastguard Worker # ------------------------------------------------------------ 280*635a8641SAndroid Build Coastguard Worker # pop_state() - Restores the previous state 281*635a8641SAndroid Build Coastguard Worker # ------------------------------------------------------------ 282*635a8641SAndroid Build Coastguard Worker def pop_state(self): 283*635a8641SAndroid Build Coastguard Worker self.begin(self.lexstatestack.pop()) 284*635a8641SAndroid Build Coastguard Worker 285*635a8641SAndroid Build Coastguard Worker # ------------------------------------------------------------ 286*635a8641SAndroid Build Coastguard Worker # current_state() - Returns the current lexing state 287*635a8641SAndroid Build Coastguard Worker # ------------------------------------------------------------ 288*635a8641SAndroid Build Coastguard Worker def current_state(self): 289*635a8641SAndroid Build Coastguard Worker return self.lexstate 290*635a8641SAndroid Build Coastguard Worker 291*635a8641SAndroid Build Coastguard Worker # ------------------------------------------------------------ 292*635a8641SAndroid Build Coastguard Worker # skip() - Skip ahead n characters 293*635a8641SAndroid Build Coastguard Worker # ------------------------------------------------------------ 294*635a8641SAndroid Build Coastguard Worker def skip(self,n): 295*635a8641SAndroid Build Coastguard Worker self.lexpos += n 296*635a8641SAndroid Build Coastguard Worker 297*635a8641SAndroid Build Coastguard Worker # ------------------------------------------------------------ 298*635a8641SAndroid Build Coastguard Worker # opttoken() - Return the next token from the Lexer 299*635a8641SAndroid Build Coastguard Worker # 300*635a8641SAndroid Build Coastguard Worker # Note: This function has been carefully implemented to be as fast 301*635a8641SAndroid Build Coastguard Worker # as possible. Don't make changes unless you really know what 302*635a8641SAndroid Build Coastguard Worker # you are doing 303*635a8641SAndroid Build Coastguard Worker # ------------------------------------------------------------ 304*635a8641SAndroid Build Coastguard Worker def token(self): 305*635a8641SAndroid Build Coastguard Worker # Make local copies of frequently referenced attributes 306*635a8641SAndroid Build Coastguard Worker lexpos = self.lexpos 307*635a8641SAndroid Build Coastguard Worker lexlen = self.lexlen 308*635a8641SAndroid Build Coastguard Worker lexignore = self.lexignore 309*635a8641SAndroid Build Coastguard Worker lexdata = self.lexdata 310*635a8641SAndroid Build Coastguard Worker 311*635a8641SAndroid Build Coastguard Worker while lexpos < lexlen: 312*635a8641SAndroid Build Coastguard Worker # This code provides some short-circuit code for whitespace, tabs, and other ignored characters 313*635a8641SAndroid Build Coastguard Worker if lexdata[lexpos] in lexignore: 314*635a8641SAndroid Build Coastguard Worker lexpos += 1 315*635a8641SAndroid Build Coastguard Worker continue 316*635a8641SAndroid Build Coastguard Worker 317*635a8641SAndroid Build Coastguard Worker # Look for a regular expression match 318*635a8641SAndroid Build Coastguard Worker for lexre,lexindexfunc in self.lexre: 319*635a8641SAndroid Build Coastguard Worker m = lexre.match(lexdata,lexpos) 320*635a8641SAndroid Build Coastguard Worker if not m: continue 321*635a8641SAndroid Build Coastguard Worker 322*635a8641SAndroid Build Coastguard Worker # Create a token for return 323*635a8641SAndroid Build Coastguard Worker tok = LexToken() 324*635a8641SAndroid Build Coastguard Worker tok.value = m.group() 325*635a8641SAndroid Build Coastguard Worker tok.lineno = self.lineno 326*635a8641SAndroid Build Coastguard Worker tok.lexpos = lexpos 327*635a8641SAndroid Build Coastguard Worker 328*635a8641SAndroid Build Coastguard Worker i = m.lastindex 329*635a8641SAndroid Build Coastguard Worker func,tok.type = lexindexfunc[i] 330*635a8641SAndroid Build Coastguard Worker 331*635a8641SAndroid Build Coastguard Worker if not func: 332*635a8641SAndroid Build Coastguard Worker # If no token type was set, it's an ignored token 333*635a8641SAndroid Build Coastguard Worker if tok.type: 334*635a8641SAndroid Build Coastguard Worker self.lexpos = m.end() 335*635a8641SAndroid Build Coastguard Worker return tok 336*635a8641SAndroid Build Coastguard Worker else: 337*635a8641SAndroid Build Coastguard Worker lexpos = m.end() 338*635a8641SAndroid Build Coastguard Worker break 339*635a8641SAndroid Build Coastguard Worker 340*635a8641SAndroid Build Coastguard Worker lexpos = m.end() 341*635a8641SAndroid Build Coastguard Worker 342*635a8641SAndroid Build Coastguard Worker # If token is processed by a function, call it 343*635a8641SAndroid Build Coastguard Worker 344*635a8641SAndroid Build Coastguard Worker tok.lexer = self # Set additional attributes useful in token rules 345*635a8641SAndroid Build Coastguard Worker self.lexmatch = m 346*635a8641SAndroid Build Coastguard Worker self.lexpos = lexpos 347*635a8641SAndroid Build Coastguard Worker 348*635a8641SAndroid Build Coastguard Worker newtok = func(tok) 349*635a8641SAndroid Build Coastguard Worker 350*635a8641SAndroid Build Coastguard Worker # Every function must return a token, if nothing, we just move to next token 351*635a8641SAndroid Build Coastguard Worker if not newtok: 352*635a8641SAndroid Build Coastguard Worker lexpos = self.lexpos # This is here in case user has updated lexpos. 353*635a8641SAndroid Build Coastguard Worker lexignore = self.lexignore # This is here in case there was a state change 354*635a8641SAndroid Build Coastguard Worker break 355*635a8641SAndroid Build Coastguard Worker 356*635a8641SAndroid Build Coastguard Worker # Verify type of the token. If not in the token map, raise an error 357*635a8641SAndroid Build Coastguard Worker if not self.lexoptimize: 358*635a8641SAndroid Build Coastguard Worker if not newtok.type in self.lextokens: 359*635a8641SAndroid Build Coastguard Worker raise LexError("%s:%d: Rule '%s' returned an unknown token type '%s'" % ( 360*635a8641SAndroid Build Coastguard Worker func_code(func).co_filename, func_code(func).co_firstlineno, 361*635a8641SAndroid Build Coastguard Worker func.__name__, newtok.type),lexdata[lexpos:]) 362*635a8641SAndroid Build Coastguard Worker 363*635a8641SAndroid Build Coastguard Worker return newtok 364*635a8641SAndroid Build Coastguard Worker else: 365*635a8641SAndroid Build Coastguard Worker # No match, see if in literals 366*635a8641SAndroid Build Coastguard Worker if lexdata[lexpos] in self.lexliterals: 367*635a8641SAndroid Build Coastguard Worker tok = LexToken() 368*635a8641SAndroid Build Coastguard Worker tok.value = lexdata[lexpos] 369*635a8641SAndroid Build Coastguard Worker tok.lineno = self.lineno 370*635a8641SAndroid Build Coastguard Worker tok.type = tok.value 371*635a8641SAndroid Build Coastguard Worker tok.lexpos = lexpos 372*635a8641SAndroid Build Coastguard Worker self.lexpos = lexpos + 1 373*635a8641SAndroid Build Coastguard Worker return tok 374*635a8641SAndroid Build Coastguard Worker 375*635a8641SAndroid Build Coastguard Worker # No match. Call t_error() if defined. 376*635a8641SAndroid Build Coastguard Worker if self.lexerrorf: 377*635a8641SAndroid Build Coastguard Worker tok = LexToken() 378*635a8641SAndroid Build Coastguard Worker tok.value = self.lexdata[lexpos:] 379*635a8641SAndroid Build Coastguard Worker tok.lineno = self.lineno 380*635a8641SAndroid Build Coastguard Worker tok.type = "error" 381*635a8641SAndroid Build Coastguard Worker tok.lexer = self 382*635a8641SAndroid Build Coastguard Worker tok.lexpos = lexpos 383*635a8641SAndroid Build Coastguard Worker self.lexpos = lexpos 384*635a8641SAndroid Build Coastguard Worker newtok = self.lexerrorf(tok) 385*635a8641SAndroid Build Coastguard Worker if lexpos == self.lexpos: 386*635a8641SAndroid Build Coastguard Worker # Error method didn't change text position at all. This is an error. 387*635a8641SAndroid Build Coastguard Worker raise LexError("Scanning error. Illegal character '%s'" % (lexdata[lexpos]), lexdata[lexpos:]) 388*635a8641SAndroid Build Coastguard Worker lexpos = self.lexpos 389*635a8641SAndroid Build Coastguard Worker if not newtok: continue 390*635a8641SAndroid Build Coastguard Worker return newtok 391*635a8641SAndroid Build Coastguard Worker 392*635a8641SAndroid Build Coastguard Worker self.lexpos = lexpos 393*635a8641SAndroid Build Coastguard Worker raise LexError("Illegal character '%s' at index %d" % (lexdata[lexpos],lexpos), lexdata[lexpos:]) 394*635a8641SAndroid Build Coastguard Worker 395*635a8641SAndroid Build Coastguard Worker self.lexpos = lexpos + 1 396*635a8641SAndroid Build Coastguard Worker if self.lexdata is None: 397*635a8641SAndroid Build Coastguard Worker raise RuntimeError("No input string given with input()") 398*635a8641SAndroid Build Coastguard Worker return None 399*635a8641SAndroid Build Coastguard Worker 400*635a8641SAndroid Build Coastguard Worker # Iterator interface 401*635a8641SAndroid Build Coastguard Worker def __iter__(self): 402*635a8641SAndroid Build Coastguard Worker return self 403*635a8641SAndroid Build Coastguard Worker 404*635a8641SAndroid Build Coastguard Worker def next(self): 405*635a8641SAndroid Build Coastguard Worker t = self.token() 406*635a8641SAndroid Build Coastguard Worker if t is None: 407*635a8641SAndroid Build Coastguard Worker raise StopIteration 408*635a8641SAndroid Build Coastguard Worker return t 409*635a8641SAndroid Build Coastguard Worker 410*635a8641SAndroid Build Coastguard Worker __next__ = next 411*635a8641SAndroid Build Coastguard Worker 412*635a8641SAndroid Build Coastguard Worker# ----------------------------------------------------------------------------- 413*635a8641SAndroid Build Coastguard Worker# ==== Lex Builder === 414*635a8641SAndroid Build Coastguard Worker# 415*635a8641SAndroid Build Coastguard Worker# The functions and classes below are used to collect lexing information 416*635a8641SAndroid Build Coastguard Worker# and build a Lexer object from it. 417*635a8641SAndroid Build Coastguard Worker# ----------------------------------------------------------------------------- 418*635a8641SAndroid Build Coastguard Worker 419*635a8641SAndroid Build Coastguard Worker# ----------------------------------------------------------------------------- 420*635a8641SAndroid Build Coastguard Worker# get_caller_module_dict() 421*635a8641SAndroid Build Coastguard Worker# 422*635a8641SAndroid Build Coastguard Worker# This function returns a dictionary containing all of the symbols defined within 423*635a8641SAndroid Build Coastguard Worker# a caller further down the call stack. This is used to get the environment 424*635a8641SAndroid Build Coastguard Worker# associated with the yacc() call if none was provided. 425*635a8641SAndroid Build Coastguard Worker# ----------------------------------------------------------------------------- 426*635a8641SAndroid Build Coastguard Worker 427*635a8641SAndroid Build Coastguard Workerdef get_caller_module_dict(levels): 428*635a8641SAndroid Build Coastguard Worker try: 429*635a8641SAndroid Build Coastguard Worker raise RuntimeError 430*635a8641SAndroid Build Coastguard Worker except RuntimeError: 431*635a8641SAndroid Build Coastguard Worker e,b,t = sys.exc_info() 432*635a8641SAndroid Build Coastguard Worker f = t.tb_frame 433*635a8641SAndroid Build Coastguard Worker while levels > 0: 434*635a8641SAndroid Build Coastguard Worker f = f.f_back 435*635a8641SAndroid Build Coastguard Worker levels -= 1 436*635a8641SAndroid Build Coastguard Worker ldict = f.f_globals.copy() 437*635a8641SAndroid Build Coastguard Worker if f.f_globals != f.f_locals: 438*635a8641SAndroid Build Coastguard Worker ldict.update(f.f_locals) 439*635a8641SAndroid Build Coastguard Worker 440*635a8641SAndroid Build Coastguard Worker return ldict 441*635a8641SAndroid Build Coastguard Worker 442*635a8641SAndroid Build Coastguard Worker# ----------------------------------------------------------------------------- 443*635a8641SAndroid Build Coastguard Worker# _funcs_to_names() 444*635a8641SAndroid Build Coastguard Worker# 445*635a8641SAndroid Build Coastguard Worker# Given a list of regular expression functions, this converts it to a list 446*635a8641SAndroid Build Coastguard Worker# suitable for output to a table file 447*635a8641SAndroid Build Coastguard Worker# ----------------------------------------------------------------------------- 448*635a8641SAndroid Build Coastguard Worker 449*635a8641SAndroid Build Coastguard Workerdef _funcs_to_names(funclist,namelist): 450*635a8641SAndroid Build Coastguard Worker result = [] 451*635a8641SAndroid Build Coastguard Worker for f,name in zip(funclist,namelist): 452*635a8641SAndroid Build Coastguard Worker if f and f[0]: 453*635a8641SAndroid Build Coastguard Worker result.append((name, f[1])) 454*635a8641SAndroid Build Coastguard Worker else: 455*635a8641SAndroid Build Coastguard Worker result.append(f) 456*635a8641SAndroid Build Coastguard Worker return result 457*635a8641SAndroid Build Coastguard Worker 458*635a8641SAndroid Build Coastguard Worker# ----------------------------------------------------------------------------- 459*635a8641SAndroid Build Coastguard Worker# _names_to_funcs() 460*635a8641SAndroid Build Coastguard Worker# 461*635a8641SAndroid Build Coastguard Worker# Given a list of regular expression function names, this converts it back to 462*635a8641SAndroid Build Coastguard Worker# functions. 463*635a8641SAndroid Build Coastguard Worker# ----------------------------------------------------------------------------- 464*635a8641SAndroid Build Coastguard Worker 465*635a8641SAndroid Build Coastguard Workerdef _names_to_funcs(namelist,fdict): 466*635a8641SAndroid Build Coastguard Worker result = [] 467*635a8641SAndroid Build Coastguard Worker for n in namelist: 468*635a8641SAndroid Build Coastguard Worker if n and n[0]: 469*635a8641SAndroid Build Coastguard Worker result.append((fdict[n[0]],n[1])) 470*635a8641SAndroid Build Coastguard Worker else: 471*635a8641SAndroid Build Coastguard Worker result.append(n) 472*635a8641SAndroid Build Coastguard Worker return result 473*635a8641SAndroid Build Coastguard Worker 474*635a8641SAndroid Build Coastguard Worker# ----------------------------------------------------------------------------- 475*635a8641SAndroid Build Coastguard Worker# _form_master_re() 476*635a8641SAndroid Build Coastguard Worker# 477*635a8641SAndroid Build Coastguard Worker# This function takes a list of all of the regex components and attempts to 478*635a8641SAndroid Build Coastguard Worker# form the master regular expression. Given limitations in the Python re 479*635a8641SAndroid Build Coastguard Worker# module, it may be necessary to break the master regex into separate expressions. 480*635a8641SAndroid Build Coastguard Worker# ----------------------------------------------------------------------------- 481*635a8641SAndroid Build Coastguard Worker 482*635a8641SAndroid Build Coastguard Workerdef _form_master_re(relist,reflags,ldict,toknames): 483*635a8641SAndroid Build Coastguard Worker if not relist: return [] 484*635a8641SAndroid Build Coastguard Worker regex = "|".join(relist) 485*635a8641SAndroid Build Coastguard Worker try: 486*635a8641SAndroid Build Coastguard Worker lexre = re.compile(regex,re.VERBOSE | reflags) 487*635a8641SAndroid Build Coastguard Worker 488*635a8641SAndroid Build Coastguard Worker # Build the index to function map for the matching engine 489*635a8641SAndroid Build Coastguard Worker lexindexfunc = [ None ] * (max(lexre.groupindex.values())+1) 490*635a8641SAndroid Build Coastguard Worker lexindexnames = lexindexfunc[:] 491*635a8641SAndroid Build Coastguard Worker 492*635a8641SAndroid Build Coastguard Worker for f,i in lexre.groupindex.items(): 493*635a8641SAndroid Build Coastguard Worker handle = ldict.get(f,None) 494*635a8641SAndroid Build Coastguard Worker if type(handle) in (types.FunctionType, types.MethodType): 495*635a8641SAndroid Build Coastguard Worker lexindexfunc[i] = (handle,toknames[f]) 496*635a8641SAndroid Build Coastguard Worker lexindexnames[i] = f 497*635a8641SAndroid Build Coastguard Worker elif handle is not None: 498*635a8641SAndroid Build Coastguard Worker lexindexnames[i] = f 499*635a8641SAndroid Build Coastguard Worker if f.find("ignore_") > 0: 500*635a8641SAndroid Build Coastguard Worker lexindexfunc[i] = (None,None) 501*635a8641SAndroid Build Coastguard Worker else: 502*635a8641SAndroid Build Coastguard Worker lexindexfunc[i] = (None, toknames[f]) 503*635a8641SAndroid Build Coastguard Worker 504*635a8641SAndroid Build Coastguard Worker return [(lexre,lexindexfunc)],[regex],[lexindexnames] 505*635a8641SAndroid Build Coastguard Worker except Exception: 506*635a8641SAndroid Build Coastguard Worker m = int(len(relist)/2) 507*635a8641SAndroid Build Coastguard Worker if m == 0: m = 1 508*635a8641SAndroid Build Coastguard Worker llist, lre, lnames = _form_master_re(relist[:m],reflags,ldict,toknames) 509*635a8641SAndroid Build Coastguard Worker rlist, rre, rnames = _form_master_re(relist[m:],reflags,ldict,toknames) 510*635a8641SAndroid Build Coastguard Worker return llist+rlist, lre+rre, lnames+rnames 511*635a8641SAndroid Build Coastguard Worker 512*635a8641SAndroid Build Coastguard Worker# ----------------------------------------------------------------------------- 513*635a8641SAndroid Build Coastguard Worker# def _statetoken(s,names) 514*635a8641SAndroid Build Coastguard Worker# 515*635a8641SAndroid Build Coastguard Worker# Given a declaration name s of the form "t_" and a dictionary whose keys are 516*635a8641SAndroid Build Coastguard Worker# state names, this function returns a tuple (states,tokenname) where states 517*635a8641SAndroid Build Coastguard Worker# is a tuple of state names and tokenname is the name of the token. For example, 518*635a8641SAndroid Build Coastguard Worker# calling this with s = "t_foo_bar_SPAM" might return (('foo','bar'),'SPAM') 519*635a8641SAndroid Build Coastguard Worker# ----------------------------------------------------------------------------- 520*635a8641SAndroid Build Coastguard Worker 521*635a8641SAndroid Build Coastguard Workerdef _statetoken(s,names): 522*635a8641SAndroid Build Coastguard Worker nonstate = 1 523*635a8641SAndroid Build Coastguard Worker parts = s.split("_") 524*635a8641SAndroid Build Coastguard Worker for i in range(1,len(parts)): 525*635a8641SAndroid Build Coastguard Worker if not parts[i] in names and parts[i] != 'ANY': break 526*635a8641SAndroid Build Coastguard Worker if i > 1: 527*635a8641SAndroid Build Coastguard Worker states = tuple(parts[1:i]) 528*635a8641SAndroid Build Coastguard Worker else: 529*635a8641SAndroid Build Coastguard Worker states = ('INITIAL',) 530*635a8641SAndroid Build Coastguard Worker 531*635a8641SAndroid Build Coastguard Worker if 'ANY' in states: 532*635a8641SAndroid Build Coastguard Worker states = tuple(names) 533*635a8641SAndroid Build Coastguard Worker 534*635a8641SAndroid Build Coastguard Worker tokenname = "_".join(parts[i:]) 535*635a8641SAndroid Build Coastguard Worker return (states,tokenname) 536*635a8641SAndroid Build Coastguard Worker 537*635a8641SAndroid Build Coastguard Worker 538*635a8641SAndroid Build Coastguard Worker# ----------------------------------------------------------------------------- 539*635a8641SAndroid Build Coastguard Worker# LexerReflect() 540*635a8641SAndroid Build Coastguard Worker# 541*635a8641SAndroid Build Coastguard Worker# This class represents information needed to build a lexer as extracted from a 542*635a8641SAndroid Build Coastguard Worker# user's input file. 543*635a8641SAndroid Build Coastguard Worker# ----------------------------------------------------------------------------- 544*635a8641SAndroid Build Coastguard Workerclass LexerReflect(object): 545*635a8641SAndroid Build Coastguard Worker def __init__(self,ldict,log=None,reflags=0): 546*635a8641SAndroid Build Coastguard Worker self.ldict = ldict 547*635a8641SAndroid Build Coastguard Worker self.error_func = None 548*635a8641SAndroid Build Coastguard Worker self.tokens = [] 549*635a8641SAndroid Build Coastguard Worker self.reflags = reflags 550*635a8641SAndroid Build Coastguard Worker self.stateinfo = { 'INITIAL' : 'inclusive'} 551*635a8641SAndroid Build Coastguard Worker self.files = {} 552*635a8641SAndroid Build Coastguard Worker self.error = 0 553*635a8641SAndroid Build Coastguard Worker 554*635a8641SAndroid Build Coastguard Worker if log is None: 555*635a8641SAndroid Build Coastguard Worker self.log = PlyLogger(sys.stderr) 556*635a8641SAndroid Build Coastguard Worker else: 557*635a8641SAndroid Build Coastguard Worker self.log = log 558*635a8641SAndroid Build Coastguard Worker 559*635a8641SAndroid Build Coastguard Worker # Get all of the basic information 560*635a8641SAndroid Build Coastguard Worker def get_all(self): 561*635a8641SAndroid Build Coastguard Worker self.get_tokens() 562*635a8641SAndroid Build Coastguard Worker self.get_literals() 563*635a8641SAndroid Build Coastguard Worker self.get_states() 564*635a8641SAndroid Build Coastguard Worker self.get_rules() 565*635a8641SAndroid Build Coastguard Worker 566*635a8641SAndroid Build Coastguard Worker # Validate all of the information 567*635a8641SAndroid Build Coastguard Worker def validate_all(self): 568*635a8641SAndroid Build Coastguard Worker self.validate_tokens() 569*635a8641SAndroid Build Coastguard Worker self.validate_literals() 570*635a8641SAndroid Build Coastguard Worker self.validate_rules() 571*635a8641SAndroid Build Coastguard Worker return self.error 572*635a8641SAndroid Build Coastguard Worker 573*635a8641SAndroid Build Coastguard Worker # Get the tokens map 574*635a8641SAndroid Build Coastguard Worker def get_tokens(self): 575*635a8641SAndroid Build Coastguard Worker tokens = self.ldict.get("tokens",None) 576*635a8641SAndroid Build Coastguard Worker if not tokens: 577*635a8641SAndroid Build Coastguard Worker self.log.error("No token list is defined") 578*635a8641SAndroid Build Coastguard Worker self.error = 1 579*635a8641SAndroid Build Coastguard Worker return 580*635a8641SAndroid Build Coastguard Worker 581*635a8641SAndroid Build Coastguard Worker if not isinstance(tokens,(list, tuple)): 582*635a8641SAndroid Build Coastguard Worker self.log.error("tokens must be a list or tuple") 583*635a8641SAndroid Build Coastguard Worker self.error = 1 584*635a8641SAndroid Build Coastguard Worker return 585*635a8641SAndroid Build Coastguard Worker 586*635a8641SAndroid Build Coastguard Worker if not tokens: 587*635a8641SAndroid Build Coastguard Worker self.log.error("tokens is empty") 588*635a8641SAndroid Build Coastguard Worker self.error = 1 589*635a8641SAndroid Build Coastguard Worker return 590*635a8641SAndroid Build Coastguard Worker 591*635a8641SAndroid Build Coastguard Worker self.tokens = tokens 592*635a8641SAndroid Build Coastguard Worker 593*635a8641SAndroid Build Coastguard Worker # Validate the tokens 594*635a8641SAndroid Build Coastguard Worker def validate_tokens(self): 595*635a8641SAndroid Build Coastguard Worker terminals = {} 596*635a8641SAndroid Build Coastguard Worker for n in self.tokens: 597*635a8641SAndroid Build Coastguard Worker if not _is_identifier.match(n): 598*635a8641SAndroid Build Coastguard Worker self.log.error("Bad token name '%s'",n) 599*635a8641SAndroid Build Coastguard Worker self.error = 1 600*635a8641SAndroid Build Coastguard Worker if n in terminals: 601*635a8641SAndroid Build Coastguard Worker self.log.warning("Token '%s' multiply defined", n) 602*635a8641SAndroid Build Coastguard Worker terminals[n] = 1 603*635a8641SAndroid Build Coastguard Worker 604*635a8641SAndroid Build Coastguard Worker # Get the literals specifier 605*635a8641SAndroid Build Coastguard Worker def get_literals(self): 606*635a8641SAndroid Build Coastguard Worker self.literals = self.ldict.get("literals","") 607*635a8641SAndroid Build Coastguard Worker 608*635a8641SAndroid Build Coastguard Worker # Validate literals 609*635a8641SAndroid Build Coastguard Worker def validate_literals(self): 610*635a8641SAndroid Build Coastguard Worker try: 611*635a8641SAndroid Build Coastguard Worker for c in self.literals: 612*635a8641SAndroid Build Coastguard Worker if not isinstance(c,StringTypes) or len(c) > 1: 613*635a8641SAndroid Build Coastguard Worker self.log.error("Invalid literal %s. Must be a single character", repr(c)) 614*635a8641SAndroid Build Coastguard Worker self.error = 1 615*635a8641SAndroid Build Coastguard Worker continue 616*635a8641SAndroid Build Coastguard Worker 617*635a8641SAndroid Build Coastguard Worker except TypeError: 618*635a8641SAndroid Build Coastguard Worker self.log.error("Invalid literals specification. literals must be a sequence of characters") 619*635a8641SAndroid Build Coastguard Worker self.error = 1 620*635a8641SAndroid Build Coastguard Worker 621*635a8641SAndroid Build Coastguard Worker def get_states(self): 622*635a8641SAndroid Build Coastguard Worker self.states = self.ldict.get("states",None) 623*635a8641SAndroid Build Coastguard Worker # Build statemap 624*635a8641SAndroid Build Coastguard Worker if self.states: 625*635a8641SAndroid Build Coastguard Worker if not isinstance(self.states,(tuple,list)): 626*635a8641SAndroid Build Coastguard Worker self.log.error("states must be defined as a tuple or list") 627*635a8641SAndroid Build Coastguard Worker self.error = 1 628*635a8641SAndroid Build Coastguard Worker else: 629*635a8641SAndroid Build Coastguard Worker for s in self.states: 630*635a8641SAndroid Build Coastguard Worker if not isinstance(s,tuple) or len(s) != 2: 631*635a8641SAndroid Build Coastguard Worker self.log.error("Invalid state specifier %s. Must be a tuple (statename,'exclusive|inclusive')",repr(s)) 632*635a8641SAndroid Build Coastguard Worker self.error = 1 633*635a8641SAndroid Build Coastguard Worker continue 634*635a8641SAndroid Build Coastguard Worker name, statetype = s 635*635a8641SAndroid Build Coastguard Worker if not isinstance(name,StringTypes): 636*635a8641SAndroid Build Coastguard Worker self.log.error("State name %s must be a string", repr(name)) 637*635a8641SAndroid Build Coastguard Worker self.error = 1 638*635a8641SAndroid Build Coastguard Worker continue 639*635a8641SAndroid Build Coastguard Worker if not (statetype == 'inclusive' or statetype == 'exclusive'): 640*635a8641SAndroid Build Coastguard Worker self.log.error("State type for state %s must be 'inclusive' or 'exclusive'",name) 641*635a8641SAndroid Build Coastguard Worker self.error = 1 642*635a8641SAndroid Build Coastguard Worker continue 643*635a8641SAndroid Build Coastguard Worker if name in self.stateinfo: 644*635a8641SAndroid Build Coastguard Worker self.log.error("State '%s' already defined",name) 645*635a8641SAndroid Build Coastguard Worker self.error = 1 646*635a8641SAndroid Build Coastguard Worker continue 647*635a8641SAndroid Build Coastguard Worker self.stateinfo[name] = statetype 648*635a8641SAndroid Build Coastguard Worker 649*635a8641SAndroid Build Coastguard Worker # Get all of the symbols with a t_ prefix and sort them into various 650*635a8641SAndroid Build Coastguard Worker # categories (functions, strings, error functions, and ignore characters) 651*635a8641SAndroid Build Coastguard Worker 652*635a8641SAndroid Build Coastguard Worker def get_rules(self): 653*635a8641SAndroid Build Coastguard Worker tsymbols = [f for f in self.ldict if f[:2] == 't_' ] 654*635a8641SAndroid Build Coastguard Worker 655*635a8641SAndroid Build Coastguard Worker # Now build up a list of functions and a list of strings 656*635a8641SAndroid Build Coastguard Worker 657*635a8641SAndroid Build Coastguard Worker self.toknames = { } # Mapping of symbols to token names 658*635a8641SAndroid Build Coastguard Worker self.funcsym = { } # Symbols defined as functions 659*635a8641SAndroid Build Coastguard Worker self.strsym = { } # Symbols defined as strings 660*635a8641SAndroid Build Coastguard Worker self.ignore = { } # Ignore strings by state 661*635a8641SAndroid Build Coastguard Worker self.errorf = { } # Error functions by state 662*635a8641SAndroid Build Coastguard Worker 663*635a8641SAndroid Build Coastguard Worker for s in self.stateinfo: 664*635a8641SAndroid Build Coastguard Worker self.funcsym[s] = [] 665*635a8641SAndroid Build Coastguard Worker self.strsym[s] = [] 666*635a8641SAndroid Build Coastguard Worker 667*635a8641SAndroid Build Coastguard Worker if len(tsymbols) == 0: 668*635a8641SAndroid Build Coastguard Worker self.log.error("No rules of the form t_rulename are defined") 669*635a8641SAndroid Build Coastguard Worker self.error = 1 670*635a8641SAndroid Build Coastguard Worker return 671*635a8641SAndroid Build Coastguard Worker 672*635a8641SAndroid Build Coastguard Worker for f in tsymbols: 673*635a8641SAndroid Build Coastguard Worker t = self.ldict[f] 674*635a8641SAndroid Build Coastguard Worker states, tokname = _statetoken(f,self.stateinfo) 675*635a8641SAndroid Build Coastguard Worker self.toknames[f] = tokname 676*635a8641SAndroid Build Coastguard Worker 677*635a8641SAndroid Build Coastguard Worker if hasattr(t,"__call__"): 678*635a8641SAndroid Build Coastguard Worker if tokname == 'error': 679*635a8641SAndroid Build Coastguard Worker for s in states: 680*635a8641SAndroid Build Coastguard Worker self.errorf[s] = t 681*635a8641SAndroid Build Coastguard Worker elif tokname == 'ignore': 682*635a8641SAndroid Build Coastguard Worker line = func_code(t).co_firstlineno 683*635a8641SAndroid Build Coastguard Worker file = func_code(t).co_filename 684*635a8641SAndroid Build Coastguard Worker self.log.error("%s:%d: Rule '%s' must be defined as a string",file,line,t.__name__) 685*635a8641SAndroid Build Coastguard Worker self.error = 1 686*635a8641SAndroid Build Coastguard Worker else: 687*635a8641SAndroid Build Coastguard Worker for s in states: 688*635a8641SAndroid Build Coastguard Worker self.funcsym[s].append((f,t)) 689*635a8641SAndroid Build Coastguard Worker elif isinstance(t, StringTypes): 690*635a8641SAndroid Build Coastguard Worker if tokname == 'ignore': 691*635a8641SAndroid Build Coastguard Worker for s in states: 692*635a8641SAndroid Build Coastguard Worker self.ignore[s] = t 693*635a8641SAndroid Build Coastguard Worker if "\\" in t: 694*635a8641SAndroid Build Coastguard Worker self.log.warning("%s contains a literal backslash '\\'",f) 695*635a8641SAndroid Build Coastguard Worker 696*635a8641SAndroid Build Coastguard Worker elif tokname == 'error': 697*635a8641SAndroid Build Coastguard Worker self.log.error("Rule '%s' must be defined as a function", f) 698*635a8641SAndroid Build Coastguard Worker self.error = 1 699*635a8641SAndroid Build Coastguard Worker else: 700*635a8641SAndroid Build Coastguard Worker for s in states: 701*635a8641SAndroid Build Coastguard Worker self.strsym[s].append((f,t)) 702*635a8641SAndroid Build Coastguard Worker else: 703*635a8641SAndroid Build Coastguard Worker self.log.error("%s not defined as a function or string", f) 704*635a8641SAndroid Build Coastguard Worker self.error = 1 705*635a8641SAndroid Build Coastguard Worker 706*635a8641SAndroid Build Coastguard Worker # Sort the functions by line number 707*635a8641SAndroid Build Coastguard Worker for f in self.funcsym.values(): 708*635a8641SAndroid Build Coastguard Worker if sys.version_info[0] < 3: 709*635a8641SAndroid Build Coastguard Worker f.sort(lambda x,y: cmp(func_code(x[1]).co_firstlineno,func_code(y[1]).co_firstlineno)) 710*635a8641SAndroid Build Coastguard Worker else: 711*635a8641SAndroid Build Coastguard Worker # Python 3.0 712*635a8641SAndroid Build Coastguard Worker f.sort(key=lambda x: func_code(x[1]).co_firstlineno) 713*635a8641SAndroid Build Coastguard Worker 714*635a8641SAndroid Build Coastguard Worker # Sort the strings by regular expression length 715*635a8641SAndroid Build Coastguard Worker for s in self.strsym.values(): 716*635a8641SAndroid Build Coastguard Worker if sys.version_info[0] < 3: 717*635a8641SAndroid Build Coastguard Worker s.sort(lambda x,y: (len(x[1]) < len(y[1])) - (len(x[1]) > len(y[1]))) 718*635a8641SAndroid Build Coastguard Worker else: 719*635a8641SAndroid Build Coastguard Worker # Python 3.0 720*635a8641SAndroid Build Coastguard Worker s.sort(key=lambda x: len(x[1]),reverse=True) 721*635a8641SAndroid Build Coastguard Worker 722*635a8641SAndroid Build Coastguard Worker # Validate all of the t_rules collected 723*635a8641SAndroid Build Coastguard Worker def validate_rules(self): 724*635a8641SAndroid Build Coastguard Worker for state in self.stateinfo: 725*635a8641SAndroid Build Coastguard Worker # Validate all rules defined by functions 726*635a8641SAndroid Build Coastguard Worker 727*635a8641SAndroid Build Coastguard Worker 728*635a8641SAndroid Build Coastguard Worker 729*635a8641SAndroid Build Coastguard Worker for fname, f in self.funcsym[state]: 730*635a8641SAndroid Build Coastguard Worker line = func_code(f).co_firstlineno 731*635a8641SAndroid Build Coastguard Worker file = func_code(f).co_filename 732*635a8641SAndroid Build Coastguard Worker self.files[file] = 1 733*635a8641SAndroid Build Coastguard Worker 734*635a8641SAndroid Build Coastguard Worker tokname = self.toknames[fname] 735*635a8641SAndroid Build Coastguard Worker if isinstance(f, types.MethodType): 736*635a8641SAndroid Build Coastguard Worker reqargs = 2 737*635a8641SAndroid Build Coastguard Worker else: 738*635a8641SAndroid Build Coastguard Worker reqargs = 1 739*635a8641SAndroid Build Coastguard Worker nargs = func_code(f).co_argcount 740*635a8641SAndroid Build Coastguard Worker if nargs > reqargs: 741*635a8641SAndroid Build Coastguard Worker self.log.error("%s:%d: Rule '%s' has too many arguments",file,line,f.__name__) 742*635a8641SAndroid Build Coastguard Worker self.error = 1 743*635a8641SAndroid Build Coastguard Worker continue 744*635a8641SAndroid Build Coastguard Worker 745*635a8641SAndroid Build Coastguard Worker if nargs < reqargs: 746*635a8641SAndroid Build Coastguard Worker self.log.error("%s:%d: Rule '%s' requires an argument", file,line,f.__name__) 747*635a8641SAndroid Build Coastguard Worker self.error = 1 748*635a8641SAndroid Build Coastguard Worker continue 749*635a8641SAndroid Build Coastguard Worker 750*635a8641SAndroid Build Coastguard Worker if not f.__doc__: 751*635a8641SAndroid Build Coastguard Worker self.log.error("%s:%d: No regular expression defined for rule '%s'",file,line,f.__name__) 752*635a8641SAndroid Build Coastguard Worker self.error = 1 753*635a8641SAndroid Build Coastguard Worker continue 754*635a8641SAndroid Build Coastguard Worker 755*635a8641SAndroid Build Coastguard Worker try: 756*635a8641SAndroid Build Coastguard Worker c = re.compile("(?P<%s>%s)" % (fname,f.__doc__), re.VERBOSE | self.reflags) 757*635a8641SAndroid Build Coastguard Worker if c.match(""): 758*635a8641SAndroid Build Coastguard Worker self.log.error("%s:%d: Regular expression for rule '%s' matches empty string", file,line,f.__name__) 759*635a8641SAndroid Build Coastguard Worker self.error = 1 760*635a8641SAndroid Build Coastguard Worker except re.error: 761*635a8641SAndroid Build Coastguard Worker _etype, e, _etrace = sys.exc_info() 762*635a8641SAndroid Build Coastguard Worker self.log.error("%s:%d: Invalid regular expression for rule '%s'. %s", file,line,f.__name__,e) 763*635a8641SAndroid Build Coastguard Worker if '#' in f.__doc__: 764*635a8641SAndroid Build Coastguard Worker self.log.error("%s:%d. Make sure '#' in rule '%s' is escaped with '\\#'",file,line, f.__name__) 765*635a8641SAndroid Build Coastguard Worker self.error = 1 766*635a8641SAndroid Build Coastguard Worker 767*635a8641SAndroid Build Coastguard Worker # Validate all rules defined by strings 768*635a8641SAndroid Build Coastguard Worker for name,r in self.strsym[state]: 769*635a8641SAndroid Build Coastguard Worker tokname = self.toknames[name] 770*635a8641SAndroid Build Coastguard Worker if tokname == 'error': 771*635a8641SAndroid Build Coastguard Worker self.log.error("Rule '%s' must be defined as a function", name) 772*635a8641SAndroid Build Coastguard Worker self.error = 1 773*635a8641SAndroid Build Coastguard Worker continue 774*635a8641SAndroid Build Coastguard Worker 775*635a8641SAndroid Build Coastguard Worker if not tokname in self.tokens and tokname.find("ignore_") < 0: 776*635a8641SAndroid Build Coastguard Worker self.log.error("Rule '%s' defined for an unspecified token %s",name,tokname) 777*635a8641SAndroid Build Coastguard Worker self.error = 1 778*635a8641SAndroid Build Coastguard Worker continue 779*635a8641SAndroid Build Coastguard Worker 780*635a8641SAndroid Build Coastguard Worker try: 781*635a8641SAndroid Build Coastguard Worker c = re.compile("(?P<%s>%s)" % (name,r),re.VERBOSE | self.reflags) 782*635a8641SAndroid Build Coastguard Worker if (c.match("")): 783*635a8641SAndroid Build Coastguard Worker self.log.error("Regular expression for rule '%s' matches empty string",name) 784*635a8641SAndroid Build Coastguard Worker self.error = 1 785*635a8641SAndroid Build Coastguard Worker except re.error: 786*635a8641SAndroid Build Coastguard Worker _etype, e, _etrace = sys.exc_info() 787*635a8641SAndroid Build Coastguard Worker self.log.error("Invalid regular expression for rule '%s'. %s",name,e) 788*635a8641SAndroid Build Coastguard Worker if '#' in r: 789*635a8641SAndroid Build Coastguard Worker self.log.error("Make sure '#' in rule '%s' is escaped with '\\#'",name) 790*635a8641SAndroid Build Coastguard Worker self.error = 1 791*635a8641SAndroid Build Coastguard Worker 792*635a8641SAndroid Build Coastguard Worker if not self.funcsym[state] and not self.strsym[state]: 793*635a8641SAndroid Build Coastguard Worker self.log.error("No rules defined for state '%s'",state) 794*635a8641SAndroid Build Coastguard Worker self.error = 1 795*635a8641SAndroid Build Coastguard Worker 796*635a8641SAndroid Build Coastguard Worker # Validate the error function 797*635a8641SAndroid Build Coastguard Worker efunc = self.errorf.get(state,None) 798*635a8641SAndroid Build Coastguard Worker if efunc: 799*635a8641SAndroid Build Coastguard Worker f = efunc 800*635a8641SAndroid Build Coastguard Worker line = func_code(f).co_firstlineno 801*635a8641SAndroid Build Coastguard Worker file = func_code(f).co_filename 802*635a8641SAndroid Build Coastguard Worker self.files[file] = 1 803*635a8641SAndroid Build Coastguard Worker 804*635a8641SAndroid Build Coastguard Worker if isinstance(f, types.MethodType): 805*635a8641SAndroid Build Coastguard Worker reqargs = 2 806*635a8641SAndroid Build Coastguard Worker else: 807*635a8641SAndroid Build Coastguard Worker reqargs = 1 808*635a8641SAndroid Build Coastguard Worker nargs = func_code(f).co_argcount 809*635a8641SAndroid Build Coastguard Worker if nargs > reqargs: 810*635a8641SAndroid Build Coastguard Worker self.log.error("%s:%d: Rule '%s' has too many arguments",file,line,f.__name__) 811*635a8641SAndroid Build Coastguard Worker self.error = 1 812*635a8641SAndroid Build Coastguard Worker 813*635a8641SAndroid Build Coastguard Worker if nargs < reqargs: 814*635a8641SAndroid Build Coastguard Worker self.log.error("%s:%d: Rule '%s' requires an argument", file,line,f.__name__) 815*635a8641SAndroid Build Coastguard Worker self.error = 1 816*635a8641SAndroid Build Coastguard Worker 817*635a8641SAndroid Build Coastguard Worker for f in self.files: 818*635a8641SAndroid Build Coastguard Worker self.validate_file(f) 819*635a8641SAndroid Build Coastguard Worker 820*635a8641SAndroid Build Coastguard Worker 821*635a8641SAndroid Build Coastguard Worker # ----------------------------------------------------------------------------- 822*635a8641SAndroid Build Coastguard Worker # validate_file() 823*635a8641SAndroid Build Coastguard Worker # 824*635a8641SAndroid Build Coastguard Worker # This checks to see if there are duplicated t_rulename() functions or strings 825*635a8641SAndroid Build Coastguard Worker # in the parser input file. This is done using a simple regular expression 826*635a8641SAndroid Build Coastguard Worker # match on each line in the given file. 827*635a8641SAndroid Build Coastguard Worker # ----------------------------------------------------------------------------- 828*635a8641SAndroid Build Coastguard Worker 829*635a8641SAndroid Build Coastguard Worker def validate_file(self,filename): 830*635a8641SAndroid Build Coastguard Worker import os.path 831*635a8641SAndroid Build Coastguard Worker base,ext = os.path.splitext(filename) 832*635a8641SAndroid Build Coastguard Worker if ext != '.py': return # No idea what the file is. Return OK 833*635a8641SAndroid Build Coastguard Worker 834*635a8641SAndroid Build Coastguard Worker try: 835*635a8641SAndroid Build Coastguard Worker f = open(filename) 836*635a8641SAndroid Build Coastguard Worker lines = f.readlines() 837*635a8641SAndroid Build Coastguard Worker f.close() 838*635a8641SAndroid Build Coastguard Worker except IOError: 839*635a8641SAndroid Build Coastguard Worker return # Couldn't find the file. Don't worry about it 840*635a8641SAndroid Build Coastguard Worker 841*635a8641SAndroid Build Coastguard Worker fre = re.compile(r'\s*def\s+(t_[a-zA-Z_0-9]*)\(') 842*635a8641SAndroid Build Coastguard Worker sre = re.compile(r'\s*(t_[a-zA-Z_0-9]*)\s*=') 843*635a8641SAndroid Build Coastguard Worker 844*635a8641SAndroid Build Coastguard Worker counthash = { } 845*635a8641SAndroid Build Coastguard Worker linen = 1 846*635a8641SAndroid Build Coastguard Worker for l in lines: 847*635a8641SAndroid Build Coastguard Worker m = fre.match(l) 848*635a8641SAndroid Build Coastguard Worker if not m: 849*635a8641SAndroid Build Coastguard Worker m = sre.match(l) 850*635a8641SAndroid Build Coastguard Worker if m: 851*635a8641SAndroid Build Coastguard Worker name = m.group(1) 852*635a8641SAndroid Build Coastguard Worker prev = counthash.get(name) 853*635a8641SAndroid Build Coastguard Worker if not prev: 854*635a8641SAndroid Build Coastguard Worker counthash[name] = linen 855*635a8641SAndroid Build Coastguard Worker else: 856*635a8641SAndroid Build Coastguard Worker self.log.error("%s:%d: Rule %s redefined. Previously defined on line %d",filename,linen,name,prev) 857*635a8641SAndroid Build Coastguard Worker self.error = 1 858*635a8641SAndroid Build Coastguard Worker linen += 1 859*635a8641SAndroid Build Coastguard Worker 860*635a8641SAndroid Build Coastguard Worker# ----------------------------------------------------------------------------- 861*635a8641SAndroid Build Coastguard Worker# lex(module) 862*635a8641SAndroid Build Coastguard Worker# 863*635a8641SAndroid Build Coastguard Worker# Build all of the regular expression rules from definitions in the supplied module 864*635a8641SAndroid Build Coastguard Worker# ----------------------------------------------------------------------------- 865*635a8641SAndroid Build Coastguard Workerdef lex(module=None,object=None,debug=0,optimize=0,lextab="lextab",reflags=0,nowarn=0,outputdir="", debuglog=None, errorlog=None): 866*635a8641SAndroid Build Coastguard Worker global lexer 867*635a8641SAndroid Build Coastguard Worker ldict = None 868*635a8641SAndroid Build Coastguard Worker stateinfo = { 'INITIAL' : 'inclusive'} 869*635a8641SAndroid Build Coastguard Worker lexobj = Lexer() 870*635a8641SAndroid Build Coastguard Worker lexobj.lexoptimize = optimize 871*635a8641SAndroid Build Coastguard Worker global token,input 872*635a8641SAndroid Build Coastguard Worker 873*635a8641SAndroid Build Coastguard Worker if errorlog is None: 874*635a8641SAndroid Build Coastguard Worker errorlog = PlyLogger(sys.stderr) 875*635a8641SAndroid Build Coastguard Worker 876*635a8641SAndroid Build Coastguard Worker if debug: 877*635a8641SAndroid Build Coastguard Worker if debuglog is None: 878*635a8641SAndroid Build Coastguard Worker debuglog = PlyLogger(sys.stderr) 879*635a8641SAndroid Build Coastguard Worker 880*635a8641SAndroid Build Coastguard Worker # Get the module dictionary used for the lexer 881*635a8641SAndroid Build Coastguard Worker if object: module = object 882*635a8641SAndroid Build Coastguard Worker 883*635a8641SAndroid Build Coastguard Worker if module: 884*635a8641SAndroid Build Coastguard Worker _items = [(k,getattr(module,k)) for k in dir(module)] 885*635a8641SAndroid Build Coastguard Worker ldict = dict(_items) 886*635a8641SAndroid Build Coastguard Worker else: 887*635a8641SAndroid Build Coastguard Worker ldict = get_caller_module_dict(2) 888*635a8641SAndroid Build Coastguard Worker 889*635a8641SAndroid Build Coastguard Worker # Collect parser information from the dictionary 890*635a8641SAndroid Build Coastguard Worker linfo = LexerReflect(ldict,log=errorlog,reflags=reflags) 891*635a8641SAndroid Build Coastguard Worker linfo.get_all() 892*635a8641SAndroid Build Coastguard Worker if not optimize: 893*635a8641SAndroid Build Coastguard Worker if linfo.validate_all(): 894*635a8641SAndroid Build Coastguard Worker raise SyntaxError("Can't build lexer") 895*635a8641SAndroid Build Coastguard Worker 896*635a8641SAndroid Build Coastguard Worker if optimize and lextab: 897*635a8641SAndroid Build Coastguard Worker try: 898*635a8641SAndroid Build Coastguard Worker lexobj.readtab(lextab,ldict) 899*635a8641SAndroid Build Coastguard Worker token = lexobj.token 900*635a8641SAndroid Build Coastguard Worker input = lexobj.input 901*635a8641SAndroid Build Coastguard Worker lexer = lexobj 902*635a8641SAndroid Build Coastguard Worker return lexobj 903*635a8641SAndroid Build Coastguard Worker 904*635a8641SAndroid Build Coastguard Worker except ImportError: 905*635a8641SAndroid Build Coastguard Worker pass 906*635a8641SAndroid Build Coastguard Worker 907*635a8641SAndroid Build Coastguard Worker # Dump some basic debugging information 908*635a8641SAndroid Build Coastguard Worker if debug: 909*635a8641SAndroid Build Coastguard Worker debuglog.info("lex: tokens = %r", linfo.tokens) 910*635a8641SAndroid Build Coastguard Worker debuglog.info("lex: literals = %r", linfo.literals) 911*635a8641SAndroid Build Coastguard Worker debuglog.info("lex: states = %r", linfo.stateinfo) 912*635a8641SAndroid Build Coastguard Worker 913*635a8641SAndroid Build Coastguard Worker # Build a dictionary of valid token names 914*635a8641SAndroid Build Coastguard Worker lexobj.lextokens = { } 915*635a8641SAndroid Build Coastguard Worker for n in linfo.tokens: 916*635a8641SAndroid Build Coastguard Worker lexobj.lextokens[n] = 1 917*635a8641SAndroid Build Coastguard Worker 918*635a8641SAndroid Build Coastguard Worker # Get literals specification 919*635a8641SAndroid Build Coastguard Worker if isinstance(linfo.literals,(list,tuple)): 920*635a8641SAndroid Build Coastguard Worker lexobj.lexliterals = type(linfo.literals[0])().join(linfo.literals) 921*635a8641SAndroid Build Coastguard Worker else: 922*635a8641SAndroid Build Coastguard Worker lexobj.lexliterals = linfo.literals 923*635a8641SAndroid Build Coastguard Worker 924*635a8641SAndroid Build Coastguard Worker # Get the stateinfo dictionary 925*635a8641SAndroid Build Coastguard Worker stateinfo = linfo.stateinfo 926*635a8641SAndroid Build Coastguard Worker 927*635a8641SAndroid Build Coastguard Worker regexs = { } 928*635a8641SAndroid Build Coastguard Worker # Build the master regular expressions 929*635a8641SAndroid Build Coastguard Worker for state in stateinfo: 930*635a8641SAndroid Build Coastguard Worker regex_list = [] 931*635a8641SAndroid Build Coastguard Worker 932*635a8641SAndroid Build Coastguard Worker # Add rules defined by functions first 933*635a8641SAndroid Build Coastguard Worker for fname, f in linfo.funcsym[state]: 934*635a8641SAndroid Build Coastguard Worker line = func_code(f).co_firstlineno 935*635a8641SAndroid Build Coastguard Worker file = func_code(f).co_filename 936*635a8641SAndroid Build Coastguard Worker regex_list.append("(?P<%s>%s)" % (fname,f.__doc__)) 937*635a8641SAndroid Build Coastguard Worker if debug: 938*635a8641SAndroid Build Coastguard Worker debuglog.info("lex: Adding rule %s -> '%s' (state '%s')",fname,f.__doc__, state) 939*635a8641SAndroid Build Coastguard Worker 940*635a8641SAndroid Build Coastguard Worker # Now add all of the simple rules 941*635a8641SAndroid Build Coastguard Worker for name,r in linfo.strsym[state]: 942*635a8641SAndroid Build Coastguard Worker regex_list.append("(?P<%s>%s)" % (name,r)) 943*635a8641SAndroid Build Coastguard Worker if debug: 944*635a8641SAndroid Build Coastguard Worker debuglog.info("lex: Adding rule %s -> '%s' (state '%s')",name,r, state) 945*635a8641SAndroid Build Coastguard Worker 946*635a8641SAndroid Build Coastguard Worker regexs[state] = regex_list 947*635a8641SAndroid Build Coastguard Worker 948*635a8641SAndroid Build Coastguard Worker # Build the master regular expressions 949*635a8641SAndroid Build Coastguard Worker 950*635a8641SAndroid Build Coastguard Worker if debug: 951*635a8641SAndroid Build Coastguard Worker debuglog.info("lex: ==== MASTER REGEXS FOLLOW ====") 952*635a8641SAndroid Build Coastguard Worker 953*635a8641SAndroid Build Coastguard Worker for state in regexs: 954*635a8641SAndroid Build Coastguard Worker lexre, re_text, re_names = _form_master_re(regexs[state],reflags,ldict,linfo.toknames) 955*635a8641SAndroid Build Coastguard Worker lexobj.lexstatere[state] = lexre 956*635a8641SAndroid Build Coastguard Worker lexobj.lexstateretext[state] = re_text 957*635a8641SAndroid Build Coastguard Worker lexobj.lexstaterenames[state] = re_names 958*635a8641SAndroid Build Coastguard Worker if debug: 959*635a8641SAndroid Build Coastguard Worker for i in range(len(re_text)): 960*635a8641SAndroid Build Coastguard Worker debuglog.info("lex: state '%s' : regex[%d] = '%s'",state, i, re_text[i]) 961*635a8641SAndroid Build Coastguard Worker 962*635a8641SAndroid Build Coastguard Worker # For inclusive states, we need to add the regular expressions from the INITIAL state 963*635a8641SAndroid Build Coastguard Worker for state,stype in stateinfo.items(): 964*635a8641SAndroid Build Coastguard Worker if state != "INITIAL" and stype == 'inclusive': 965*635a8641SAndroid Build Coastguard Worker lexobj.lexstatere[state].extend(lexobj.lexstatere['INITIAL']) 966*635a8641SAndroid Build Coastguard Worker lexobj.lexstateretext[state].extend(lexobj.lexstateretext['INITIAL']) 967*635a8641SAndroid Build Coastguard Worker lexobj.lexstaterenames[state].extend(lexobj.lexstaterenames['INITIAL']) 968*635a8641SAndroid Build Coastguard Worker 969*635a8641SAndroid Build Coastguard Worker lexobj.lexstateinfo = stateinfo 970*635a8641SAndroid Build Coastguard Worker lexobj.lexre = lexobj.lexstatere["INITIAL"] 971*635a8641SAndroid Build Coastguard Worker lexobj.lexretext = lexobj.lexstateretext["INITIAL"] 972*635a8641SAndroid Build Coastguard Worker lexobj.lexreflags = reflags 973*635a8641SAndroid Build Coastguard Worker 974*635a8641SAndroid Build Coastguard Worker # Set up ignore variables 975*635a8641SAndroid Build Coastguard Worker lexobj.lexstateignore = linfo.ignore 976*635a8641SAndroid Build Coastguard Worker lexobj.lexignore = lexobj.lexstateignore.get("INITIAL","") 977*635a8641SAndroid Build Coastguard Worker 978*635a8641SAndroid Build Coastguard Worker # Set up error functions 979*635a8641SAndroid Build Coastguard Worker lexobj.lexstateerrorf = linfo.errorf 980*635a8641SAndroid Build Coastguard Worker lexobj.lexerrorf = linfo.errorf.get("INITIAL",None) 981*635a8641SAndroid Build Coastguard Worker if not lexobj.lexerrorf: 982*635a8641SAndroid Build Coastguard Worker errorlog.warning("No t_error rule is defined") 983*635a8641SAndroid Build Coastguard Worker 984*635a8641SAndroid Build Coastguard Worker # Check state information for ignore and error rules 985*635a8641SAndroid Build Coastguard Worker for s,stype in stateinfo.items(): 986*635a8641SAndroid Build Coastguard Worker if stype == 'exclusive': 987*635a8641SAndroid Build Coastguard Worker if not s in linfo.errorf: 988*635a8641SAndroid Build Coastguard Worker errorlog.warning("No error rule is defined for exclusive state '%s'", s) 989*635a8641SAndroid Build Coastguard Worker if not s in linfo.ignore and lexobj.lexignore: 990*635a8641SAndroid Build Coastguard Worker errorlog.warning("No ignore rule is defined for exclusive state '%s'", s) 991*635a8641SAndroid Build Coastguard Worker elif stype == 'inclusive': 992*635a8641SAndroid Build Coastguard Worker if not s in linfo.errorf: 993*635a8641SAndroid Build Coastguard Worker linfo.errorf[s] = linfo.errorf.get("INITIAL",None) 994*635a8641SAndroid Build Coastguard Worker if not s in linfo.ignore: 995*635a8641SAndroid Build Coastguard Worker linfo.ignore[s] = linfo.ignore.get("INITIAL","") 996*635a8641SAndroid Build Coastguard Worker 997*635a8641SAndroid Build Coastguard Worker # Create global versions of the token() and input() functions 998*635a8641SAndroid Build Coastguard Worker token = lexobj.token 999*635a8641SAndroid Build Coastguard Worker input = lexobj.input 1000*635a8641SAndroid Build Coastguard Worker lexer = lexobj 1001*635a8641SAndroid Build Coastguard Worker 1002*635a8641SAndroid Build Coastguard Worker # If in optimize mode, we write the lextab 1003*635a8641SAndroid Build Coastguard Worker if lextab and optimize: 1004*635a8641SAndroid Build Coastguard Worker lexobj.writetab(lextab,outputdir) 1005*635a8641SAndroid Build Coastguard Worker 1006*635a8641SAndroid Build Coastguard Worker return lexobj 1007*635a8641SAndroid Build Coastguard Worker 1008*635a8641SAndroid Build Coastguard Worker# ----------------------------------------------------------------------------- 1009*635a8641SAndroid Build Coastguard Worker# runmain() 1010*635a8641SAndroid Build Coastguard Worker# 1011*635a8641SAndroid Build Coastguard Worker# This runs the lexer as a main program 1012*635a8641SAndroid Build Coastguard Worker# ----------------------------------------------------------------------------- 1013*635a8641SAndroid Build Coastguard Worker 1014*635a8641SAndroid Build Coastguard Workerdef runmain(lexer=None,data=None): 1015*635a8641SAndroid Build Coastguard Worker if not data: 1016*635a8641SAndroid Build Coastguard Worker try: 1017*635a8641SAndroid Build Coastguard Worker filename = sys.argv[1] 1018*635a8641SAndroid Build Coastguard Worker f = open(filename) 1019*635a8641SAndroid Build Coastguard Worker data = f.read() 1020*635a8641SAndroid Build Coastguard Worker f.close() 1021*635a8641SAndroid Build Coastguard Worker except IndexError: 1022*635a8641SAndroid Build Coastguard Worker sys.stdout.write("Reading from standard input (type EOF to end):\n") 1023*635a8641SAndroid Build Coastguard Worker data = sys.stdin.read() 1024*635a8641SAndroid Build Coastguard Worker 1025*635a8641SAndroid Build Coastguard Worker if lexer: 1026*635a8641SAndroid Build Coastguard Worker _input = lexer.input 1027*635a8641SAndroid Build Coastguard Worker else: 1028*635a8641SAndroid Build Coastguard Worker _input = input 1029*635a8641SAndroid Build Coastguard Worker _input(data) 1030*635a8641SAndroid Build Coastguard Worker if lexer: 1031*635a8641SAndroid Build Coastguard Worker _token = lexer.token 1032*635a8641SAndroid Build Coastguard Worker else: 1033*635a8641SAndroid Build Coastguard Worker _token = token 1034*635a8641SAndroid Build Coastguard Worker 1035*635a8641SAndroid Build Coastguard Worker while 1: 1036*635a8641SAndroid Build Coastguard Worker tok = _token() 1037*635a8641SAndroid Build Coastguard Worker if not tok: break 1038*635a8641SAndroid Build Coastguard Worker sys.stdout.write("(%s,%r,%d,%d)\n" % (tok.type, tok.value, tok.lineno,tok.lexpos)) 1039*635a8641SAndroid Build Coastguard Worker 1040*635a8641SAndroid Build Coastguard Worker# ----------------------------------------------------------------------------- 1041*635a8641SAndroid Build Coastguard Worker# @TOKEN(regex) 1042*635a8641SAndroid Build Coastguard Worker# 1043*635a8641SAndroid Build Coastguard Worker# This decorator function can be used to set the regex expression on a function 1044*635a8641SAndroid Build Coastguard Worker# when its docstring might need to be set in an alternative way 1045*635a8641SAndroid Build Coastguard Worker# ----------------------------------------------------------------------------- 1046*635a8641SAndroid Build Coastguard Worker 1047*635a8641SAndroid Build Coastguard Workerdef TOKEN(r): 1048*635a8641SAndroid Build Coastguard Worker def set_doc(f): 1049*635a8641SAndroid Build Coastguard Worker if hasattr(r,"__call__"): 1050*635a8641SAndroid Build Coastguard Worker f.__doc__ = r.__doc__ 1051*635a8641SAndroid Build Coastguard Worker else: 1052*635a8641SAndroid Build Coastguard Worker f.__doc__ = r 1053*635a8641SAndroid Build Coastguard Worker return f 1054*635a8641SAndroid Build Coastguard Worker return set_doc 1055*635a8641SAndroid Build Coastguard Worker 1056*635a8641SAndroid Build Coastguard Worker# Alternative spelling of the TOKEN decorator 1057*635a8641SAndroid Build Coastguard WorkerToken = TOKEN 1058*635a8641SAndroid Build Coastguard Worker 1059