1*cda5da8dSAndroid Build Coastguard Worker"""Facility to use the Expat parser to load a minidom instance 2*cda5da8dSAndroid Build Coastguard Workerfrom a string or file. 3*cda5da8dSAndroid Build Coastguard Worker 4*cda5da8dSAndroid Build Coastguard WorkerThis avoids all the overhead of SAX and pulldom to gain performance. 5*cda5da8dSAndroid Build Coastguard Worker""" 6*cda5da8dSAndroid Build Coastguard Worker 7*cda5da8dSAndroid Build Coastguard Worker# Warning! 8*cda5da8dSAndroid Build Coastguard Worker# 9*cda5da8dSAndroid Build Coastguard Worker# This module is tightly bound to the implementation details of the 10*cda5da8dSAndroid Build Coastguard Worker# minidom DOM and can't be used with other DOM implementations. This 11*cda5da8dSAndroid Build Coastguard Worker# is due, in part, to a lack of appropriate methods in the DOM (there is 12*cda5da8dSAndroid Build Coastguard Worker# no way to create Entity and Notation nodes via the DOM Level 2 13*cda5da8dSAndroid Build Coastguard Worker# interface), and for performance. The latter is the cause of some fairly 14*cda5da8dSAndroid Build Coastguard Worker# cryptic code. 15*cda5da8dSAndroid Build Coastguard Worker# 16*cda5da8dSAndroid Build Coastguard Worker# Performance hacks: 17*cda5da8dSAndroid Build Coastguard Worker# 18*cda5da8dSAndroid Build Coastguard Worker# - .character_data_handler() has an extra case in which continuing 19*cda5da8dSAndroid Build Coastguard Worker# data is appended to an existing Text node; this can be a 20*cda5da8dSAndroid Build Coastguard Worker# speedup since pyexpat can break up character data into multiple 21*cda5da8dSAndroid Build Coastguard Worker# callbacks even though we set the buffer_text attribute on the 22*cda5da8dSAndroid Build Coastguard Worker# parser. This also gives us the advantage that we don't need a 23*cda5da8dSAndroid Build Coastguard Worker# separate normalization pass. 24*cda5da8dSAndroid Build Coastguard Worker# 25*cda5da8dSAndroid Build Coastguard Worker# - Determining that a node exists is done using an identity comparison 26*cda5da8dSAndroid Build Coastguard Worker# with None rather than a truth test; this avoids searching for and 27*cda5da8dSAndroid Build Coastguard Worker# calling any methods on the node object if it exists. (A rather 28*cda5da8dSAndroid Build Coastguard Worker# nice speedup is achieved this way as well!) 29*cda5da8dSAndroid Build Coastguard Worker 30*cda5da8dSAndroid Build Coastguard Workerfrom xml.dom import xmlbuilder, minidom, Node 31*cda5da8dSAndroid Build Coastguard Workerfrom xml.dom import EMPTY_NAMESPACE, EMPTY_PREFIX, XMLNS_NAMESPACE 32*cda5da8dSAndroid Build Coastguard Workerfrom xml.parsers import expat 33*cda5da8dSAndroid Build Coastguard Workerfrom xml.dom.minidom import _append_child, _set_attribute_node 34*cda5da8dSAndroid Build Coastguard Workerfrom xml.dom.NodeFilter import NodeFilter 35*cda5da8dSAndroid Build Coastguard Worker 36*cda5da8dSAndroid Build Coastguard WorkerTEXT_NODE = Node.TEXT_NODE 37*cda5da8dSAndroid Build Coastguard WorkerCDATA_SECTION_NODE = Node.CDATA_SECTION_NODE 38*cda5da8dSAndroid Build Coastguard WorkerDOCUMENT_NODE = Node.DOCUMENT_NODE 39*cda5da8dSAndroid Build Coastguard Worker 40*cda5da8dSAndroid Build Coastguard WorkerFILTER_ACCEPT = xmlbuilder.DOMBuilderFilter.FILTER_ACCEPT 41*cda5da8dSAndroid Build Coastguard WorkerFILTER_REJECT = xmlbuilder.DOMBuilderFilter.FILTER_REJECT 42*cda5da8dSAndroid Build Coastguard WorkerFILTER_SKIP = xmlbuilder.DOMBuilderFilter.FILTER_SKIP 43*cda5da8dSAndroid Build Coastguard WorkerFILTER_INTERRUPT = xmlbuilder.DOMBuilderFilter.FILTER_INTERRUPT 44*cda5da8dSAndroid Build Coastguard Worker 45*cda5da8dSAndroid Build Coastguard WorkertheDOMImplementation = minidom.getDOMImplementation() 46*cda5da8dSAndroid Build Coastguard Worker 47*cda5da8dSAndroid Build Coastguard Worker# Expat typename -> TypeInfo 48*cda5da8dSAndroid Build Coastguard Worker_typeinfo_map = { 49*cda5da8dSAndroid Build Coastguard Worker "CDATA": minidom.TypeInfo(None, "cdata"), 50*cda5da8dSAndroid Build Coastguard Worker "ENUM": minidom.TypeInfo(None, "enumeration"), 51*cda5da8dSAndroid Build Coastguard Worker "ENTITY": minidom.TypeInfo(None, "entity"), 52*cda5da8dSAndroid Build Coastguard Worker "ENTITIES": minidom.TypeInfo(None, "entities"), 53*cda5da8dSAndroid Build Coastguard Worker "ID": minidom.TypeInfo(None, "id"), 54*cda5da8dSAndroid Build Coastguard Worker "IDREF": minidom.TypeInfo(None, "idref"), 55*cda5da8dSAndroid Build Coastguard Worker "IDREFS": minidom.TypeInfo(None, "idrefs"), 56*cda5da8dSAndroid Build Coastguard Worker "NMTOKEN": minidom.TypeInfo(None, "nmtoken"), 57*cda5da8dSAndroid Build Coastguard Worker "NMTOKENS": minidom.TypeInfo(None, "nmtokens"), 58*cda5da8dSAndroid Build Coastguard Worker } 59*cda5da8dSAndroid Build Coastguard Worker 60*cda5da8dSAndroid Build Coastguard Workerclass ElementInfo(object): 61*cda5da8dSAndroid Build Coastguard Worker __slots__ = '_attr_info', '_model', 'tagName' 62*cda5da8dSAndroid Build Coastguard Worker 63*cda5da8dSAndroid Build Coastguard Worker def __init__(self, tagName, model=None): 64*cda5da8dSAndroid Build Coastguard Worker self.tagName = tagName 65*cda5da8dSAndroid Build Coastguard Worker self._attr_info = [] 66*cda5da8dSAndroid Build Coastguard Worker self._model = model 67*cda5da8dSAndroid Build Coastguard Worker 68*cda5da8dSAndroid Build Coastguard Worker def __getstate__(self): 69*cda5da8dSAndroid Build Coastguard Worker return self._attr_info, self._model, self.tagName 70*cda5da8dSAndroid Build Coastguard Worker 71*cda5da8dSAndroid Build Coastguard Worker def __setstate__(self, state): 72*cda5da8dSAndroid Build Coastguard Worker self._attr_info, self._model, self.tagName = state 73*cda5da8dSAndroid Build Coastguard Worker 74*cda5da8dSAndroid Build Coastguard Worker def getAttributeType(self, aname): 75*cda5da8dSAndroid Build Coastguard Worker for info in self._attr_info: 76*cda5da8dSAndroid Build Coastguard Worker if info[1] == aname: 77*cda5da8dSAndroid Build Coastguard Worker t = info[-2] 78*cda5da8dSAndroid Build Coastguard Worker if t[0] == "(": 79*cda5da8dSAndroid Build Coastguard Worker return _typeinfo_map["ENUM"] 80*cda5da8dSAndroid Build Coastguard Worker else: 81*cda5da8dSAndroid Build Coastguard Worker return _typeinfo_map[info[-2]] 82*cda5da8dSAndroid Build Coastguard Worker return minidom._no_type 83*cda5da8dSAndroid Build Coastguard Worker 84*cda5da8dSAndroid Build Coastguard Worker def getAttributeTypeNS(self, namespaceURI, localName): 85*cda5da8dSAndroid Build Coastguard Worker return minidom._no_type 86*cda5da8dSAndroid Build Coastguard Worker 87*cda5da8dSAndroid Build Coastguard Worker def isElementContent(self): 88*cda5da8dSAndroid Build Coastguard Worker if self._model: 89*cda5da8dSAndroid Build Coastguard Worker type = self._model[0] 90*cda5da8dSAndroid Build Coastguard Worker return type not in (expat.model.XML_CTYPE_ANY, 91*cda5da8dSAndroid Build Coastguard Worker expat.model.XML_CTYPE_MIXED) 92*cda5da8dSAndroid Build Coastguard Worker else: 93*cda5da8dSAndroid Build Coastguard Worker return False 94*cda5da8dSAndroid Build Coastguard Worker 95*cda5da8dSAndroid Build Coastguard Worker def isEmpty(self): 96*cda5da8dSAndroid Build Coastguard Worker if self._model: 97*cda5da8dSAndroid Build Coastguard Worker return self._model[0] == expat.model.XML_CTYPE_EMPTY 98*cda5da8dSAndroid Build Coastguard Worker else: 99*cda5da8dSAndroid Build Coastguard Worker return False 100*cda5da8dSAndroid Build Coastguard Worker 101*cda5da8dSAndroid Build Coastguard Worker def isId(self, aname): 102*cda5da8dSAndroid Build Coastguard Worker for info in self._attr_info: 103*cda5da8dSAndroid Build Coastguard Worker if info[1] == aname: 104*cda5da8dSAndroid Build Coastguard Worker return info[-2] == "ID" 105*cda5da8dSAndroid Build Coastguard Worker return False 106*cda5da8dSAndroid Build Coastguard Worker 107*cda5da8dSAndroid Build Coastguard Worker def isIdNS(self, euri, ename, auri, aname): 108*cda5da8dSAndroid Build Coastguard Worker # not sure this is meaningful 109*cda5da8dSAndroid Build Coastguard Worker return self.isId((auri, aname)) 110*cda5da8dSAndroid Build Coastguard Worker 111*cda5da8dSAndroid Build Coastguard Workerdef _intern(builder, s): 112*cda5da8dSAndroid Build Coastguard Worker return builder._intern_setdefault(s, s) 113*cda5da8dSAndroid Build Coastguard Worker 114*cda5da8dSAndroid Build Coastguard Workerdef _parse_ns_name(builder, name): 115*cda5da8dSAndroid Build Coastguard Worker assert ' ' in name 116*cda5da8dSAndroid Build Coastguard Worker parts = name.split(' ') 117*cda5da8dSAndroid Build Coastguard Worker intern = builder._intern_setdefault 118*cda5da8dSAndroid Build Coastguard Worker if len(parts) == 3: 119*cda5da8dSAndroid Build Coastguard Worker uri, localname, prefix = parts 120*cda5da8dSAndroid Build Coastguard Worker prefix = intern(prefix, prefix) 121*cda5da8dSAndroid Build Coastguard Worker qname = "%s:%s" % (prefix, localname) 122*cda5da8dSAndroid Build Coastguard Worker qname = intern(qname, qname) 123*cda5da8dSAndroid Build Coastguard Worker localname = intern(localname, localname) 124*cda5da8dSAndroid Build Coastguard Worker elif len(parts) == 2: 125*cda5da8dSAndroid Build Coastguard Worker uri, localname = parts 126*cda5da8dSAndroid Build Coastguard Worker prefix = EMPTY_PREFIX 127*cda5da8dSAndroid Build Coastguard Worker qname = localname = intern(localname, localname) 128*cda5da8dSAndroid Build Coastguard Worker else: 129*cda5da8dSAndroid Build Coastguard Worker raise ValueError("Unsupported syntax: spaces in URIs not supported: %r" % name) 130*cda5da8dSAndroid Build Coastguard Worker return intern(uri, uri), localname, prefix, qname 131*cda5da8dSAndroid Build Coastguard Worker 132*cda5da8dSAndroid Build Coastguard Worker 133*cda5da8dSAndroid Build Coastguard Workerclass ExpatBuilder: 134*cda5da8dSAndroid Build Coastguard Worker """Document builder that uses Expat to build a ParsedXML.DOM document 135*cda5da8dSAndroid Build Coastguard Worker instance.""" 136*cda5da8dSAndroid Build Coastguard Worker 137*cda5da8dSAndroid Build Coastguard Worker def __init__(self, options=None): 138*cda5da8dSAndroid Build Coastguard Worker if options is None: 139*cda5da8dSAndroid Build Coastguard Worker options = xmlbuilder.Options() 140*cda5da8dSAndroid Build Coastguard Worker self._options = options 141*cda5da8dSAndroid Build Coastguard Worker if self._options.filter is not None: 142*cda5da8dSAndroid Build Coastguard Worker self._filter = FilterVisibilityController(self._options.filter) 143*cda5da8dSAndroid Build Coastguard Worker else: 144*cda5da8dSAndroid Build Coastguard Worker self._filter = None 145*cda5da8dSAndroid Build Coastguard Worker # This *really* doesn't do anything in this case, so 146*cda5da8dSAndroid Build Coastguard Worker # override it with something fast & minimal. 147*cda5da8dSAndroid Build Coastguard Worker self._finish_start_element = id 148*cda5da8dSAndroid Build Coastguard Worker self._parser = None 149*cda5da8dSAndroid Build Coastguard Worker self.reset() 150*cda5da8dSAndroid Build Coastguard Worker 151*cda5da8dSAndroid Build Coastguard Worker def createParser(self): 152*cda5da8dSAndroid Build Coastguard Worker """Create a new parser object.""" 153*cda5da8dSAndroid Build Coastguard Worker return expat.ParserCreate() 154*cda5da8dSAndroid Build Coastguard Worker 155*cda5da8dSAndroid Build Coastguard Worker def getParser(self): 156*cda5da8dSAndroid Build Coastguard Worker """Return the parser object, creating a new one if needed.""" 157*cda5da8dSAndroid Build Coastguard Worker if not self._parser: 158*cda5da8dSAndroid Build Coastguard Worker self._parser = self.createParser() 159*cda5da8dSAndroid Build Coastguard Worker self._intern_setdefault = self._parser.intern.setdefault 160*cda5da8dSAndroid Build Coastguard Worker self._parser.buffer_text = True 161*cda5da8dSAndroid Build Coastguard Worker self._parser.ordered_attributes = True 162*cda5da8dSAndroid Build Coastguard Worker self._parser.specified_attributes = True 163*cda5da8dSAndroid Build Coastguard Worker self.install(self._parser) 164*cda5da8dSAndroid Build Coastguard Worker return self._parser 165*cda5da8dSAndroid Build Coastguard Worker 166*cda5da8dSAndroid Build Coastguard Worker def reset(self): 167*cda5da8dSAndroid Build Coastguard Worker """Free all data structures used during DOM construction.""" 168*cda5da8dSAndroid Build Coastguard Worker self.document = theDOMImplementation.createDocument( 169*cda5da8dSAndroid Build Coastguard Worker EMPTY_NAMESPACE, None, None) 170*cda5da8dSAndroid Build Coastguard Worker self.curNode = self.document 171*cda5da8dSAndroid Build Coastguard Worker self._elem_info = self.document._elem_info 172*cda5da8dSAndroid Build Coastguard Worker self._cdata = False 173*cda5da8dSAndroid Build Coastguard Worker 174*cda5da8dSAndroid Build Coastguard Worker def install(self, parser): 175*cda5da8dSAndroid Build Coastguard Worker """Install the callbacks needed to build the DOM into the parser.""" 176*cda5da8dSAndroid Build Coastguard Worker # This creates circular references! 177*cda5da8dSAndroid Build Coastguard Worker parser.StartDoctypeDeclHandler = self.start_doctype_decl_handler 178*cda5da8dSAndroid Build Coastguard Worker parser.StartElementHandler = self.first_element_handler 179*cda5da8dSAndroid Build Coastguard Worker parser.EndElementHandler = self.end_element_handler 180*cda5da8dSAndroid Build Coastguard Worker parser.ProcessingInstructionHandler = self.pi_handler 181*cda5da8dSAndroid Build Coastguard Worker if self._options.entities: 182*cda5da8dSAndroid Build Coastguard Worker parser.EntityDeclHandler = self.entity_decl_handler 183*cda5da8dSAndroid Build Coastguard Worker parser.NotationDeclHandler = self.notation_decl_handler 184*cda5da8dSAndroid Build Coastguard Worker if self._options.comments: 185*cda5da8dSAndroid Build Coastguard Worker parser.CommentHandler = self.comment_handler 186*cda5da8dSAndroid Build Coastguard Worker if self._options.cdata_sections: 187*cda5da8dSAndroid Build Coastguard Worker parser.StartCdataSectionHandler = self.start_cdata_section_handler 188*cda5da8dSAndroid Build Coastguard Worker parser.EndCdataSectionHandler = self.end_cdata_section_handler 189*cda5da8dSAndroid Build Coastguard Worker parser.CharacterDataHandler = self.character_data_handler_cdata 190*cda5da8dSAndroid Build Coastguard Worker else: 191*cda5da8dSAndroid Build Coastguard Worker parser.CharacterDataHandler = self.character_data_handler 192*cda5da8dSAndroid Build Coastguard Worker parser.ExternalEntityRefHandler = self.external_entity_ref_handler 193*cda5da8dSAndroid Build Coastguard Worker parser.XmlDeclHandler = self.xml_decl_handler 194*cda5da8dSAndroid Build Coastguard Worker parser.ElementDeclHandler = self.element_decl_handler 195*cda5da8dSAndroid Build Coastguard Worker parser.AttlistDeclHandler = self.attlist_decl_handler 196*cda5da8dSAndroid Build Coastguard Worker 197*cda5da8dSAndroid Build Coastguard Worker def parseFile(self, file): 198*cda5da8dSAndroid Build Coastguard Worker """Parse a document from a file object, returning the document 199*cda5da8dSAndroid Build Coastguard Worker node.""" 200*cda5da8dSAndroid Build Coastguard Worker parser = self.getParser() 201*cda5da8dSAndroid Build Coastguard Worker first_buffer = True 202*cda5da8dSAndroid Build Coastguard Worker try: 203*cda5da8dSAndroid Build Coastguard Worker while 1: 204*cda5da8dSAndroid Build Coastguard Worker buffer = file.read(16*1024) 205*cda5da8dSAndroid Build Coastguard Worker if not buffer: 206*cda5da8dSAndroid Build Coastguard Worker break 207*cda5da8dSAndroid Build Coastguard Worker parser.Parse(buffer, False) 208*cda5da8dSAndroid Build Coastguard Worker if first_buffer and self.document.documentElement: 209*cda5da8dSAndroid Build Coastguard Worker self._setup_subset(buffer) 210*cda5da8dSAndroid Build Coastguard Worker first_buffer = False 211*cda5da8dSAndroid Build Coastguard Worker parser.Parse(b"", True) 212*cda5da8dSAndroid Build Coastguard Worker except ParseEscape: 213*cda5da8dSAndroid Build Coastguard Worker pass 214*cda5da8dSAndroid Build Coastguard Worker doc = self.document 215*cda5da8dSAndroid Build Coastguard Worker self.reset() 216*cda5da8dSAndroid Build Coastguard Worker self._parser = None 217*cda5da8dSAndroid Build Coastguard Worker return doc 218*cda5da8dSAndroid Build Coastguard Worker 219*cda5da8dSAndroid Build Coastguard Worker def parseString(self, string): 220*cda5da8dSAndroid Build Coastguard Worker """Parse a document from a string, returning the document node.""" 221*cda5da8dSAndroid Build Coastguard Worker parser = self.getParser() 222*cda5da8dSAndroid Build Coastguard Worker try: 223*cda5da8dSAndroid Build Coastguard Worker parser.Parse(string, True) 224*cda5da8dSAndroid Build Coastguard Worker self._setup_subset(string) 225*cda5da8dSAndroid Build Coastguard Worker except ParseEscape: 226*cda5da8dSAndroid Build Coastguard Worker pass 227*cda5da8dSAndroid Build Coastguard Worker doc = self.document 228*cda5da8dSAndroid Build Coastguard Worker self.reset() 229*cda5da8dSAndroid Build Coastguard Worker self._parser = None 230*cda5da8dSAndroid Build Coastguard Worker return doc 231*cda5da8dSAndroid Build Coastguard Worker 232*cda5da8dSAndroid Build Coastguard Worker def _setup_subset(self, buffer): 233*cda5da8dSAndroid Build Coastguard Worker """Load the internal subset if there might be one.""" 234*cda5da8dSAndroid Build Coastguard Worker if self.document.doctype: 235*cda5da8dSAndroid Build Coastguard Worker extractor = InternalSubsetExtractor() 236*cda5da8dSAndroid Build Coastguard Worker extractor.parseString(buffer) 237*cda5da8dSAndroid Build Coastguard Worker subset = extractor.getSubset() 238*cda5da8dSAndroid Build Coastguard Worker self.document.doctype.internalSubset = subset 239*cda5da8dSAndroid Build Coastguard Worker 240*cda5da8dSAndroid Build Coastguard Worker def start_doctype_decl_handler(self, doctypeName, systemId, publicId, 241*cda5da8dSAndroid Build Coastguard Worker has_internal_subset): 242*cda5da8dSAndroid Build Coastguard Worker doctype = self.document.implementation.createDocumentType( 243*cda5da8dSAndroid Build Coastguard Worker doctypeName, publicId, systemId) 244*cda5da8dSAndroid Build Coastguard Worker doctype.ownerDocument = self.document 245*cda5da8dSAndroid Build Coastguard Worker _append_child(self.document, doctype) 246*cda5da8dSAndroid Build Coastguard Worker self.document.doctype = doctype 247*cda5da8dSAndroid Build Coastguard Worker if self._filter and self._filter.acceptNode(doctype) == FILTER_REJECT: 248*cda5da8dSAndroid Build Coastguard Worker self.document.doctype = None 249*cda5da8dSAndroid Build Coastguard Worker del self.document.childNodes[-1] 250*cda5da8dSAndroid Build Coastguard Worker doctype = None 251*cda5da8dSAndroid Build Coastguard Worker self._parser.EntityDeclHandler = None 252*cda5da8dSAndroid Build Coastguard Worker self._parser.NotationDeclHandler = None 253*cda5da8dSAndroid Build Coastguard Worker if has_internal_subset: 254*cda5da8dSAndroid Build Coastguard Worker if doctype is not None: 255*cda5da8dSAndroid Build Coastguard Worker doctype.entities._seq = [] 256*cda5da8dSAndroid Build Coastguard Worker doctype.notations._seq = [] 257*cda5da8dSAndroid Build Coastguard Worker self._parser.CommentHandler = None 258*cda5da8dSAndroid Build Coastguard Worker self._parser.ProcessingInstructionHandler = None 259*cda5da8dSAndroid Build Coastguard Worker self._parser.EndDoctypeDeclHandler = self.end_doctype_decl_handler 260*cda5da8dSAndroid Build Coastguard Worker 261*cda5da8dSAndroid Build Coastguard Worker def end_doctype_decl_handler(self): 262*cda5da8dSAndroid Build Coastguard Worker if self._options.comments: 263*cda5da8dSAndroid Build Coastguard Worker self._parser.CommentHandler = self.comment_handler 264*cda5da8dSAndroid Build Coastguard Worker self._parser.ProcessingInstructionHandler = self.pi_handler 265*cda5da8dSAndroid Build Coastguard Worker if not (self._elem_info or self._filter): 266*cda5da8dSAndroid Build Coastguard Worker self._finish_end_element = id 267*cda5da8dSAndroid Build Coastguard Worker 268*cda5da8dSAndroid Build Coastguard Worker def pi_handler(self, target, data): 269*cda5da8dSAndroid Build Coastguard Worker node = self.document.createProcessingInstruction(target, data) 270*cda5da8dSAndroid Build Coastguard Worker _append_child(self.curNode, node) 271*cda5da8dSAndroid Build Coastguard Worker if self._filter and self._filter.acceptNode(node) == FILTER_REJECT: 272*cda5da8dSAndroid Build Coastguard Worker self.curNode.removeChild(node) 273*cda5da8dSAndroid Build Coastguard Worker 274*cda5da8dSAndroid Build Coastguard Worker def character_data_handler_cdata(self, data): 275*cda5da8dSAndroid Build Coastguard Worker childNodes = self.curNode.childNodes 276*cda5da8dSAndroid Build Coastguard Worker if self._cdata: 277*cda5da8dSAndroid Build Coastguard Worker if ( self._cdata_continue 278*cda5da8dSAndroid Build Coastguard Worker and childNodes[-1].nodeType == CDATA_SECTION_NODE): 279*cda5da8dSAndroid Build Coastguard Worker childNodes[-1].appendData(data) 280*cda5da8dSAndroid Build Coastguard Worker return 281*cda5da8dSAndroid Build Coastguard Worker node = self.document.createCDATASection(data) 282*cda5da8dSAndroid Build Coastguard Worker self._cdata_continue = True 283*cda5da8dSAndroid Build Coastguard Worker elif childNodes and childNodes[-1].nodeType == TEXT_NODE: 284*cda5da8dSAndroid Build Coastguard Worker node = childNodes[-1] 285*cda5da8dSAndroid Build Coastguard Worker value = node.data + data 286*cda5da8dSAndroid Build Coastguard Worker node.data = value 287*cda5da8dSAndroid Build Coastguard Worker return 288*cda5da8dSAndroid Build Coastguard Worker else: 289*cda5da8dSAndroid Build Coastguard Worker node = minidom.Text() 290*cda5da8dSAndroid Build Coastguard Worker node.data = data 291*cda5da8dSAndroid Build Coastguard Worker node.ownerDocument = self.document 292*cda5da8dSAndroid Build Coastguard Worker _append_child(self.curNode, node) 293*cda5da8dSAndroid Build Coastguard Worker 294*cda5da8dSAndroid Build Coastguard Worker def character_data_handler(self, data): 295*cda5da8dSAndroid Build Coastguard Worker childNodes = self.curNode.childNodes 296*cda5da8dSAndroid Build Coastguard Worker if childNodes and childNodes[-1].nodeType == TEXT_NODE: 297*cda5da8dSAndroid Build Coastguard Worker node = childNodes[-1] 298*cda5da8dSAndroid Build Coastguard Worker node.data = node.data + data 299*cda5da8dSAndroid Build Coastguard Worker return 300*cda5da8dSAndroid Build Coastguard Worker node = minidom.Text() 301*cda5da8dSAndroid Build Coastguard Worker node.data = node.data + data 302*cda5da8dSAndroid Build Coastguard Worker node.ownerDocument = self.document 303*cda5da8dSAndroid Build Coastguard Worker _append_child(self.curNode, node) 304*cda5da8dSAndroid Build Coastguard Worker 305*cda5da8dSAndroid Build Coastguard Worker def entity_decl_handler(self, entityName, is_parameter_entity, value, 306*cda5da8dSAndroid Build Coastguard Worker base, systemId, publicId, notationName): 307*cda5da8dSAndroid Build Coastguard Worker if is_parameter_entity: 308*cda5da8dSAndroid Build Coastguard Worker # we don't care about parameter entities for the DOM 309*cda5da8dSAndroid Build Coastguard Worker return 310*cda5da8dSAndroid Build Coastguard Worker if not self._options.entities: 311*cda5da8dSAndroid Build Coastguard Worker return 312*cda5da8dSAndroid Build Coastguard Worker node = self.document._create_entity(entityName, publicId, 313*cda5da8dSAndroid Build Coastguard Worker systemId, notationName) 314*cda5da8dSAndroid Build Coastguard Worker if value is not None: 315*cda5da8dSAndroid Build Coastguard Worker # internal entity 316*cda5da8dSAndroid Build Coastguard Worker # node *should* be readonly, but we'll cheat 317*cda5da8dSAndroid Build Coastguard Worker child = self.document.createTextNode(value) 318*cda5da8dSAndroid Build Coastguard Worker node.childNodes.append(child) 319*cda5da8dSAndroid Build Coastguard Worker self.document.doctype.entities._seq.append(node) 320*cda5da8dSAndroid Build Coastguard Worker if self._filter and self._filter.acceptNode(node) == FILTER_REJECT: 321*cda5da8dSAndroid Build Coastguard Worker del self.document.doctype.entities._seq[-1] 322*cda5da8dSAndroid Build Coastguard Worker 323*cda5da8dSAndroid Build Coastguard Worker def notation_decl_handler(self, notationName, base, systemId, publicId): 324*cda5da8dSAndroid Build Coastguard Worker node = self.document._create_notation(notationName, publicId, systemId) 325*cda5da8dSAndroid Build Coastguard Worker self.document.doctype.notations._seq.append(node) 326*cda5da8dSAndroid Build Coastguard Worker if self._filter and self._filter.acceptNode(node) == FILTER_ACCEPT: 327*cda5da8dSAndroid Build Coastguard Worker del self.document.doctype.notations._seq[-1] 328*cda5da8dSAndroid Build Coastguard Worker 329*cda5da8dSAndroid Build Coastguard Worker def comment_handler(self, data): 330*cda5da8dSAndroid Build Coastguard Worker node = self.document.createComment(data) 331*cda5da8dSAndroid Build Coastguard Worker _append_child(self.curNode, node) 332*cda5da8dSAndroid Build Coastguard Worker if self._filter and self._filter.acceptNode(node) == FILTER_REJECT: 333*cda5da8dSAndroid Build Coastguard Worker self.curNode.removeChild(node) 334*cda5da8dSAndroid Build Coastguard Worker 335*cda5da8dSAndroid Build Coastguard Worker def start_cdata_section_handler(self): 336*cda5da8dSAndroid Build Coastguard Worker self._cdata = True 337*cda5da8dSAndroid Build Coastguard Worker self._cdata_continue = False 338*cda5da8dSAndroid Build Coastguard Worker 339*cda5da8dSAndroid Build Coastguard Worker def end_cdata_section_handler(self): 340*cda5da8dSAndroid Build Coastguard Worker self._cdata = False 341*cda5da8dSAndroid Build Coastguard Worker self._cdata_continue = False 342*cda5da8dSAndroid Build Coastguard Worker 343*cda5da8dSAndroid Build Coastguard Worker def external_entity_ref_handler(self, context, base, systemId, publicId): 344*cda5da8dSAndroid Build Coastguard Worker return 1 345*cda5da8dSAndroid Build Coastguard Worker 346*cda5da8dSAndroid Build Coastguard Worker def first_element_handler(self, name, attributes): 347*cda5da8dSAndroid Build Coastguard Worker if self._filter is None and not self._elem_info: 348*cda5da8dSAndroid Build Coastguard Worker self._finish_end_element = id 349*cda5da8dSAndroid Build Coastguard Worker self.getParser().StartElementHandler = self.start_element_handler 350*cda5da8dSAndroid Build Coastguard Worker self.start_element_handler(name, attributes) 351*cda5da8dSAndroid Build Coastguard Worker 352*cda5da8dSAndroid Build Coastguard Worker def start_element_handler(self, name, attributes): 353*cda5da8dSAndroid Build Coastguard Worker node = self.document.createElement(name) 354*cda5da8dSAndroid Build Coastguard Worker _append_child(self.curNode, node) 355*cda5da8dSAndroid Build Coastguard Worker self.curNode = node 356*cda5da8dSAndroid Build Coastguard Worker 357*cda5da8dSAndroid Build Coastguard Worker if attributes: 358*cda5da8dSAndroid Build Coastguard Worker for i in range(0, len(attributes), 2): 359*cda5da8dSAndroid Build Coastguard Worker a = minidom.Attr(attributes[i], EMPTY_NAMESPACE, 360*cda5da8dSAndroid Build Coastguard Worker None, EMPTY_PREFIX) 361*cda5da8dSAndroid Build Coastguard Worker value = attributes[i+1] 362*cda5da8dSAndroid Build Coastguard Worker a.value = value 363*cda5da8dSAndroid Build Coastguard Worker a.ownerDocument = self.document 364*cda5da8dSAndroid Build Coastguard Worker _set_attribute_node(node, a) 365*cda5da8dSAndroid Build Coastguard Worker 366*cda5da8dSAndroid Build Coastguard Worker if node is not self.document.documentElement: 367*cda5da8dSAndroid Build Coastguard Worker self._finish_start_element(node) 368*cda5da8dSAndroid Build Coastguard Worker 369*cda5da8dSAndroid Build Coastguard Worker def _finish_start_element(self, node): 370*cda5da8dSAndroid Build Coastguard Worker if self._filter: 371*cda5da8dSAndroid Build Coastguard Worker # To be general, we'd have to call isSameNode(), but this 372*cda5da8dSAndroid Build Coastguard Worker # is sufficient for minidom: 373*cda5da8dSAndroid Build Coastguard Worker if node is self.document.documentElement: 374*cda5da8dSAndroid Build Coastguard Worker return 375*cda5da8dSAndroid Build Coastguard Worker filt = self._filter.startContainer(node) 376*cda5da8dSAndroid Build Coastguard Worker if filt == FILTER_REJECT: 377*cda5da8dSAndroid Build Coastguard Worker # ignore this node & all descendents 378*cda5da8dSAndroid Build Coastguard Worker Rejecter(self) 379*cda5da8dSAndroid Build Coastguard Worker elif filt == FILTER_SKIP: 380*cda5da8dSAndroid Build Coastguard Worker # ignore this node, but make it's children become 381*cda5da8dSAndroid Build Coastguard Worker # children of the parent node 382*cda5da8dSAndroid Build Coastguard Worker Skipper(self) 383*cda5da8dSAndroid Build Coastguard Worker else: 384*cda5da8dSAndroid Build Coastguard Worker return 385*cda5da8dSAndroid Build Coastguard Worker self.curNode = node.parentNode 386*cda5da8dSAndroid Build Coastguard Worker node.parentNode.removeChild(node) 387*cda5da8dSAndroid Build Coastguard Worker node.unlink() 388*cda5da8dSAndroid Build Coastguard Worker 389*cda5da8dSAndroid Build Coastguard Worker # If this ever changes, Namespaces.end_element_handler() needs to 390*cda5da8dSAndroid Build Coastguard Worker # be changed to match. 391*cda5da8dSAndroid Build Coastguard Worker # 392*cda5da8dSAndroid Build Coastguard Worker def end_element_handler(self, name): 393*cda5da8dSAndroid Build Coastguard Worker curNode = self.curNode 394*cda5da8dSAndroid Build Coastguard Worker self.curNode = curNode.parentNode 395*cda5da8dSAndroid Build Coastguard Worker self._finish_end_element(curNode) 396*cda5da8dSAndroid Build Coastguard Worker 397*cda5da8dSAndroid Build Coastguard Worker def _finish_end_element(self, curNode): 398*cda5da8dSAndroid Build Coastguard Worker info = self._elem_info.get(curNode.tagName) 399*cda5da8dSAndroid Build Coastguard Worker if info: 400*cda5da8dSAndroid Build Coastguard Worker self._handle_white_text_nodes(curNode, info) 401*cda5da8dSAndroid Build Coastguard Worker if self._filter: 402*cda5da8dSAndroid Build Coastguard Worker if curNode is self.document.documentElement: 403*cda5da8dSAndroid Build Coastguard Worker return 404*cda5da8dSAndroid Build Coastguard Worker if self._filter.acceptNode(curNode) == FILTER_REJECT: 405*cda5da8dSAndroid Build Coastguard Worker self.curNode.removeChild(curNode) 406*cda5da8dSAndroid Build Coastguard Worker curNode.unlink() 407*cda5da8dSAndroid Build Coastguard Worker 408*cda5da8dSAndroid Build Coastguard Worker def _handle_white_text_nodes(self, node, info): 409*cda5da8dSAndroid Build Coastguard Worker if (self._options.whitespace_in_element_content 410*cda5da8dSAndroid Build Coastguard Worker or not info.isElementContent()): 411*cda5da8dSAndroid Build Coastguard Worker return 412*cda5da8dSAndroid Build Coastguard Worker 413*cda5da8dSAndroid Build Coastguard Worker # We have element type information and should remove ignorable 414*cda5da8dSAndroid Build Coastguard Worker # whitespace; identify for text nodes which contain only 415*cda5da8dSAndroid Build Coastguard Worker # whitespace. 416*cda5da8dSAndroid Build Coastguard Worker L = [] 417*cda5da8dSAndroid Build Coastguard Worker for child in node.childNodes: 418*cda5da8dSAndroid Build Coastguard Worker if child.nodeType == TEXT_NODE and not child.data.strip(): 419*cda5da8dSAndroid Build Coastguard Worker L.append(child) 420*cda5da8dSAndroid Build Coastguard Worker 421*cda5da8dSAndroid Build Coastguard Worker # Remove ignorable whitespace from the tree. 422*cda5da8dSAndroid Build Coastguard Worker for child in L: 423*cda5da8dSAndroid Build Coastguard Worker node.removeChild(child) 424*cda5da8dSAndroid Build Coastguard Worker 425*cda5da8dSAndroid Build Coastguard Worker def element_decl_handler(self, name, model): 426*cda5da8dSAndroid Build Coastguard Worker info = self._elem_info.get(name) 427*cda5da8dSAndroid Build Coastguard Worker if info is None: 428*cda5da8dSAndroid Build Coastguard Worker self._elem_info[name] = ElementInfo(name, model) 429*cda5da8dSAndroid Build Coastguard Worker else: 430*cda5da8dSAndroid Build Coastguard Worker assert info._model is None 431*cda5da8dSAndroid Build Coastguard Worker info._model = model 432*cda5da8dSAndroid Build Coastguard Worker 433*cda5da8dSAndroid Build Coastguard Worker def attlist_decl_handler(self, elem, name, type, default, required): 434*cda5da8dSAndroid Build Coastguard Worker info = self._elem_info.get(elem) 435*cda5da8dSAndroid Build Coastguard Worker if info is None: 436*cda5da8dSAndroid Build Coastguard Worker info = ElementInfo(elem) 437*cda5da8dSAndroid Build Coastguard Worker self._elem_info[elem] = info 438*cda5da8dSAndroid Build Coastguard Worker info._attr_info.append( 439*cda5da8dSAndroid Build Coastguard Worker [None, name, None, None, default, 0, type, required]) 440*cda5da8dSAndroid Build Coastguard Worker 441*cda5da8dSAndroid Build Coastguard Worker def xml_decl_handler(self, version, encoding, standalone): 442*cda5da8dSAndroid Build Coastguard Worker self.document.version = version 443*cda5da8dSAndroid Build Coastguard Worker self.document.encoding = encoding 444*cda5da8dSAndroid Build Coastguard Worker # This is still a little ugly, thanks to the pyexpat API. ;-( 445*cda5da8dSAndroid Build Coastguard Worker if standalone >= 0: 446*cda5da8dSAndroid Build Coastguard Worker if standalone: 447*cda5da8dSAndroid Build Coastguard Worker self.document.standalone = True 448*cda5da8dSAndroid Build Coastguard Worker else: 449*cda5da8dSAndroid Build Coastguard Worker self.document.standalone = False 450*cda5da8dSAndroid Build Coastguard Worker 451*cda5da8dSAndroid Build Coastguard Worker 452*cda5da8dSAndroid Build Coastguard Worker# Don't include FILTER_INTERRUPT, since that's checked separately 453*cda5da8dSAndroid Build Coastguard Worker# where allowed. 454*cda5da8dSAndroid Build Coastguard Worker_ALLOWED_FILTER_RETURNS = (FILTER_ACCEPT, FILTER_REJECT, FILTER_SKIP) 455*cda5da8dSAndroid Build Coastguard Worker 456*cda5da8dSAndroid Build Coastguard Workerclass FilterVisibilityController(object): 457*cda5da8dSAndroid Build Coastguard Worker """Wrapper around a DOMBuilderFilter which implements the checks 458*cda5da8dSAndroid Build Coastguard Worker to make the whatToShow filter attribute work.""" 459*cda5da8dSAndroid Build Coastguard Worker 460*cda5da8dSAndroid Build Coastguard Worker __slots__ = 'filter', 461*cda5da8dSAndroid Build Coastguard Worker 462*cda5da8dSAndroid Build Coastguard Worker def __init__(self, filter): 463*cda5da8dSAndroid Build Coastguard Worker self.filter = filter 464*cda5da8dSAndroid Build Coastguard Worker 465*cda5da8dSAndroid Build Coastguard Worker def startContainer(self, node): 466*cda5da8dSAndroid Build Coastguard Worker mask = self._nodetype_mask[node.nodeType] 467*cda5da8dSAndroid Build Coastguard Worker if self.filter.whatToShow & mask: 468*cda5da8dSAndroid Build Coastguard Worker val = self.filter.startContainer(node) 469*cda5da8dSAndroid Build Coastguard Worker if val == FILTER_INTERRUPT: 470*cda5da8dSAndroid Build Coastguard Worker raise ParseEscape 471*cda5da8dSAndroid Build Coastguard Worker if val not in _ALLOWED_FILTER_RETURNS: 472*cda5da8dSAndroid Build Coastguard Worker raise ValueError( 473*cda5da8dSAndroid Build Coastguard Worker "startContainer() returned illegal value: " + repr(val)) 474*cda5da8dSAndroid Build Coastguard Worker return val 475*cda5da8dSAndroid Build Coastguard Worker else: 476*cda5da8dSAndroid Build Coastguard Worker return FILTER_ACCEPT 477*cda5da8dSAndroid Build Coastguard Worker 478*cda5da8dSAndroid Build Coastguard Worker def acceptNode(self, node): 479*cda5da8dSAndroid Build Coastguard Worker mask = self._nodetype_mask[node.nodeType] 480*cda5da8dSAndroid Build Coastguard Worker if self.filter.whatToShow & mask: 481*cda5da8dSAndroid Build Coastguard Worker val = self.filter.acceptNode(node) 482*cda5da8dSAndroid Build Coastguard Worker if val == FILTER_INTERRUPT: 483*cda5da8dSAndroid Build Coastguard Worker raise ParseEscape 484*cda5da8dSAndroid Build Coastguard Worker if val == FILTER_SKIP: 485*cda5da8dSAndroid Build Coastguard Worker # move all child nodes to the parent, and remove this node 486*cda5da8dSAndroid Build Coastguard Worker parent = node.parentNode 487*cda5da8dSAndroid Build Coastguard Worker for child in node.childNodes[:]: 488*cda5da8dSAndroid Build Coastguard Worker parent.appendChild(child) 489*cda5da8dSAndroid Build Coastguard Worker # node is handled by the caller 490*cda5da8dSAndroid Build Coastguard Worker return FILTER_REJECT 491*cda5da8dSAndroid Build Coastguard Worker if val not in _ALLOWED_FILTER_RETURNS: 492*cda5da8dSAndroid Build Coastguard Worker raise ValueError( 493*cda5da8dSAndroid Build Coastguard Worker "acceptNode() returned illegal value: " + repr(val)) 494*cda5da8dSAndroid Build Coastguard Worker return val 495*cda5da8dSAndroid Build Coastguard Worker else: 496*cda5da8dSAndroid Build Coastguard Worker return FILTER_ACCEPT 497*cda5da8dSAndroid Build Coastguard Worker 498*cda5da8dSAndroid Build Coastguard Worker _nodetype_mask = { 499*cda5da8dSAndroid Build Coastguard Worker Node.ELEMENT_NODE: NodeFilter.SHOW_ELEMENT, 500*cda5da8dSAndroid Build Coastguard Worker Node.ATTRIBUTE_NODE: NodeFilter.SHOW_ATTRIBUTE, 501*cda5da8dSAndroid Build Coastguard Worker Node.TEXT_NODE: NodeFilter.SHOW_TEXT, 502*cda5da8dSAndroid Build Coastguard Worker Node.CDATA_SECTION_NODE: NodeFilter.SHOW_CDATA_SECTION, 503*cda5da8dSAndroid Build Coastguard Worker Node.ENTITY_REFERENCE_NODE: NodeFilter.SHOW_ENTITY_REFERENCE, 504*cda5da8dSAndroid Build Coastguard Worker Node.ENTITY_NODE: NodeFilter.SHOW_ENTITY, 505*cda5da8dSAndroid Build Coastguard Worker Node.PROCESSING_INSTRUCTION_NODE: NodeFilter.SHOW_PROCESSING_INSTRUCTION, 506*cda5da8dSAndroid Build Coastguard Worker Node.COMMENT_NODE: NodeFilter.SHOW_COMMENT, 507*cda5da8dSAndroid Build Coastguard Worker Node.DOCUMENT_NODE: NodeFilter.SHOW_DOCUMENT, 508*cda5da8dSAndroid Build Coastguard Worker Node.DOCUMENT_TYPE_NODE: NodeFilter.SHOW_DOCUMENT_TYPE, 509*cda5da8dSAndroid Build Coastguard Worker Node.DOCUMENT_FRAGMENT_NODE: NodeFilter.SHOW_DOCUMENT_FRAGMENT, 510*cda5da8dSAndroid Build Coastguard Worker Node.NOTATION_NODE: NodeFilter.SHOW_NOTATION, 511*cda5da8dSAndroid Build Coastguard Worker } 512*cda5da8dSAndroid Build Coastguard Worker 513*cda5da8dSAndroid Build Coastguard Worker 514*cda5da8dSAndroid Build Coastguard Workerclass FilterCrutch(object): 515*cda5da8dSAndroid Build Coastguard Worker __slots__ = '_builder', '_level', '_old_start', '_old_end' 516*cda5da8dSAndroid Build Coastguard Worker 517*cda5da8dSAndroid Build Coastguard Worker def __init__(self, builder): 518*cda5da8dSAndroid Build Coastguard Worker self._level = 0 519*cda5da8dSAndroid Build Coastguard Worker self._builder = builder 520*cda5da8dSAndroid Build Coastguard Worker parser = builder._parser 521*cda5da8dSAndroid Build Coastguard Worker self._old_start = parser.StartElementHandler 522*cda5da8dSAndroid Build Coastguard Worker self._old_end = parser.EndElementHandler 523*cda5da8dSAndroid Build Coastguard Worker parser.StartElementHandler = self.start_element_handler 524*cda5da8dSAndroid Build Coastguard Worker parser.EndElementHandler = self.end_element_handler 525*cda5da8dSAndroid Build Coastguard Worker 526*cda5da8dSAndroid Build Coastguard Workerclass Rejecter(FilterCrutch): 527*cda5da8dSAndroid Build Coastguard Worker __slots__ = () 528*cda5da8dSAndroid Build Coastguard Worker 529*cda5da8dSAndroid Build Coastguard Worker def __init__(self, builder): 530*cda5da8dSAndroid Build Coastguard Worker FilterCrutch.__init__(self, builder) 531*cda5da8dSAndroid Build Coastguard Worker parser = builder._parser 532*cda5da8dSAndroid Build Coastguard Worker for name in ("ProcessingInstructionHandler", 533*cda5da8dSAndroid Build Coastguard Worker "CommentHandler", 534*cda5da8dSAndroid Build Coastguard Worker "CharacterDataHandler", 535*cda5da8dSAndroid Build Coastguard Worker "StartCdataSectionHandler", 536*cda5da8dSAndroid Build Coastguard Worker "EndCdataSectionHandler", 537*cda5da8dSAndroid Build Coastguard Worker "ExternalEntityRefHandler", 538*cda5da8dSAndroid Build Coastguard Worker ): 539*cda5da8dSAndroid Build Coastguard Worker setattr(parser, name, None) 540*cda5da8dSAndroid Build Coastguard Worker 541*cda5da8dSAndroid Build Coastguard Worker def start_element_handler(self, *args): 542*cda5da8dSAndroid Build Coastguard Worker self._level = self._level + 1 543*cda5da8dSAndroid Build Coastguard Worker 544*cda5da8dSAndroid Build Coastguard Worker def end_element_handler(self, *args): 545*cda5da8dSAndroid Build Coastguard Worker if self._level == 0: 546*cda5da8dSAndroid Build Coastguard Worker # restore the old handlers 547*cda5da8dSAndroid Build Coastguard Worker parser = self._builder._parser 548*cda5da8dSAndroid Build Coastguard Worker self._builder.install(parser) 549*cda5da8dSAndroid Build Coastguard Worker parser.StartElementHandler = self._old_start 550*cda5da8dSAndroid Build Coastguard Worker parser.EndElementHandler = self._old_end 551*cda5da8dSAndroid Build Coastguard Worker else: 552*cda5da8dSAndroid Build Coastguard Worker self._level = self._level - 1 553*cda5da8dSAndroid Build Coastguard Worker 554*cda5da8dSAndroid Build Coastguard Workerclass Skipper(FilterCrutch): 555*cda5da8dSAndroid Build Coastguard Worker __slots__ = () 556*cda5da8dSAndroid Build Coastguard Worker 557*cda5da8dSAndroid Build Coastguard Worker def start_element_handler(self, *args): 558*cda5da8dSAndroid Build Coastguard Worker node = self._builder.curNode 559*cda5da8dSAndroid Build Coastguard Worker self._old_start(*args) 560*cda5da8dSAndroid Build Coastguard Worker if self._builder.curNode is not node: 561*cda5da8dSAndroid Build Coastguard Worker self._level = self._level + 1 562*cda5da8dSAndroid Build Coastguard Worker 563*cda5da8dSAndroid Build Coastguard Worker def end_element_handler(self, *args): 564*cda5da8dSAndroid Build Coastguard Worker if self._level == 0: 565*cda5da8dSAndroid Build Coastguard Worker # We're popping back out of the node we're skipping, so we 566*cda5da8dSAndroid Build Coastguard Worker # shouldn't need to do anything but reset the handlers. 567*cda5da8dSAndroid Build Coastguard Worker self._builder._parser.StartElementHandler = self._old_start 568*cda5da8dSAndroid Build Coastguard Worker self._builder._parser.EndElementHandler = self._old_end 569*cda5da8dSAndroid Build Coastguard Worker self._builder = None 570*cda5da8dSAndroid Build Coastguard Worker else: 571*cda5da8dSAndroid Build Coastguard Worker self._level = self._level - 1 572*cda5da8dSAndroid Build Coastguard Worker self._old_end(*args) 573*cda5da8dSAndroid Build Coastguard Worker 574*cda5da8dSAndroid Build Coastguard Worker 575*cda5da8dSAndroid Build Coastguard Worker# framework document used by the fragment builder. 576*cda5da8dSAndroid Build Coastguard Worker# Takes a string for the doctype, subset string, and namespace attrs string. 577*cda5da8dSAndroid Build Coastguard Worker 578*cda5da8dSAndroid Build Coastguard Worker_FRAGMENT_BUILDER_INTERNAL_SYSTEM_ID = \ 579*cda5da8dSAndroid Build Coastguard Worker "http://xml.python.org/entities/fragment-builder/internal" 580*cda5da8dSAndroid Build Coastguard Worker 581*cda5da8dSAndroid Build Coastguard Worker_FRAGMENT_BUILDER_TEMPLATE = ( 582*cda5da8dSAndroid Build Coastguard Worker '''\ 583*cda5da8dSAndroid Build Coastguard Worker<!DOCTYPE wrapper 584*cda5da8dSAndroid Build Coastguard Worker %%s [ 585*cda5da8dSAndroid Build Coastguard Worker <!ENTITY fragment-builder-internal 586*cda5da8dSAndroid Build Coastguard Worker SYSTEM "%s"> 587*cda5da8dSAndroid Build Coastguard Worker%%s 588*cda5da8dSAndroid Build Coastguard Worker]> 589*cda5da8dSAndroid Build Coastguard Worker<wrapper %%s 590*cda5da8dSAndroid Build Coastguard Worker>&fragment-builder-internal;</wrapper>''' 591*cda5da8dSAndroid Build Coastguard Worker % _FRAGMENT_BUILDER_INTERNAL_SYSTEM_ID) 592*cda5da8dSAndroid Build Coastguard Worker 593*cda5da8dSAndroid Build Coastguard Worker 594*cda5da8dSAndroid Build Coastguard Workerclass FragmentBuilder(ExpatBuilder): 595*cda5da8dSAndroid Build Coastguard Worker """Builder which constructs document fragments given XML source 596*cda5da8dSAndroid Build Coastguard Worker text and a context node. 597*cda5da8dSAndroid Build Coastguard Worker 598*cda5da8dSAndroid Build Coastguard Worker The context node is expected to provide information about the 599*cda5da8dSAndroid Build Coastguard Worker namespace declarations which are in scope at the start of the 600*cda5da8dSAndroid Build Coastguard Worker fragment. 601*cda5da8dSAndroid Build Coastguard Worker """ 602*cda5da8dSAndroid Build Coastguard Worker 603*cda5da8dSAndroid Build Coastguard Worker def __init__(self, context, options=None): 604*cda5da8dSAndroid Build Coastguard Worker if context.nodeType == DOCUMENT_NODE: 605*cda5da8dSAndroid Build Coastguard Worker self.originalDocument = context 606*cda5da8dSAndroid Build Coastguard Worker self.context = context 607*cda5da8dSAndroid Build Coastguard Worker else: 608*cda5da8dSAndroid Build Coastguard Worker self.originalDocument = context.ownerDocument 609*cda5da8dSAndroid Build Coastguard Worker self.context = context 610*cda5da8dSAndroid Build Coastguard Worker ExpatBuilder.__init__(self, options) 611*cda5da8dSAndroid Build Coastguard Worker 612*cda5da8dSAndroid Build Coastguard Worker def reset(self): 613*cda5da8dSAndroid Build Coastguard Worker ExpatBuilder.reset(self) 614*cda5da8dSAndroid Build Coastguard Worker self.fragment = None 615*cda5da8dSAndroid Build Coastguard Worker 616*cda5da8dSAndroid Build Coastguard Worker def parseFile(self, file): 617*cda5da8dSAndroid Build Coastguard Worker """Parse a document fragment from a file object, returning the 618*cda5da8dSAndroid Build Coastguard Worker fragment node.""" 619*cda5da8dSAndroid Build Coastguard Worker return self.parseString(file.read()) 620*cda5da8dSAndroid Build Coastguard Worker 621*cda5da8dSAndroid Build Coastguard Worker def parseString(self, string): 622*cda5da8dSAndroid Build Coastguard Worker """Parse a document fragment from a string, returning the 623*cda5da8dSAndroid Build Coastguard Worker fragment node.""" 624*cda5da8dSAndroid Build Coastguard Worker self._source = string 625*cda5da8dSAndroid Build Coastguard Worker parser = self.getParser() 626*cda5da8dSAndroid Build Coastguard Worker doctype = self.originalDocument.doctype 627*cda5da8dSAndroid Build Coastguard Worker ident = "" 628*cda5da8dSAndroid Build Coastguard Worker if doctype: 629*cda5da8dSAndroid Build Coastguard Worker subset = doctype.internalSubset or self._getDeclarations() 630*cda5da8dSAndroid Build Coastguard Worker if doctype.publicId: 631*cda5da8dSAndroid Build Coastguard Worker ident = ('PUBLIC "%s" "%s"' 632*cda5da8dSAndroid Build Coastguard Worker % (doctype.publicId, doctype.systemId)) 633*cda5da8dSAndroid Build Coastguard Worker elif doctype.systemId: 634*cda5da8dSAndroid Build Coastguard Worker ident = 'SYSTEM "%s"' % doctype.systemId 635*cda5da8dSAndroid Build Coastguard Worker else: 636*cda5da8dSAndroid Build Coastguard Worker subset = "" 637*cda5da8dSAndroid Build Coastguard Worker nsattrs = self._getNSattrs() # get ns decls from node's ancestors 638*cda5da8dSAndroid Build Coastguard Worker document = _FRAGMENT_BUILDER_TEMPLATE % (ident, subset, nsattrs) 639*cda5da8dSAndroid Build Coastguard Worker try: 640*cda5da8dSAndroid Build Coastguard Worker parser.Parse(document, True) 641*cda5da8dSAndroid Build Coastguard Worker except: 642*cda5da8dSAndroid Build Coastguard Worker self.reset() 643*cda5da8dSAndroid Build Coastguard Worker raise 644*cda5da8dSAndroid Build Coastguard Worker fragment = self.fragment 645*cda5da8dSAndroid Build Coastguard Worker self.reset() 646*cda5da8dSAndroid Build Coastguard Worker## self._parser = None 647*cda5da8dSAndroid Build Coastguard Worker return fragment 648*cda5da8dSAndroid Build Coastguard Worker 649*cda5da8dSAndroid Build Coastguard Worker def _getDeclarations(self): 650*cda5da8dSAndroid Build Coastguard Worker """Re-create the internal subset from the DocumentType node. 651*cda5da8dSAndroid Build Coastguard Worker 652*cda5da8dSAndroid Build Coastguard Worker This is only needed if we don't already have the 653*cda5da8dSAndroid Build Coastguard Worker internalSubset as a string. 654*cda5da8dSAndroid Build Coastguard Worker """ 655*cda5da8dSAndroid Build Coastguard Worker doctype = self.context.ownerDocument.doctype 656*cda5da8dSAndroid Build Coastguard Worker s = "" 657*cda5da8dSAndroid Build Coastguard Worker if doctype: 658*cda5da8dSAndroid Build Coastguard Worker for i in range(doctype.notations.length): 659*cda5da8dSAndroid Build Coastguard Worker notation = doctype.notations.item(i) 660*cda5da8dSAndroid Build Coastguard Worker if s: 661*cda5da8dSAndroid Build Coastguard Worker s = s + "\n " 662*cda5da8dSAndroid Build Coastguard Worker s = "%s<!NOTATION %s" % (s, notation.nodeName) 663*cda5da8dSAndroid Build Coastguard Worker if notation.publicId: 664*cda5da8dSAndroid Build Coastguard Worker s = '%s PUBLIC "%s"\n "%s">' \ 665*cda5da8dSAndroid Build Coastguard Worker % (s, notation.publicId, notation.systemId) 666*cda5da8dSAndroid Build Coastguard Worker else: 667*cda5da8dSAndroid Build Coastguard Worker s = '%s SYSTEM "%s">' % (s, notation.systemId) 668*cda5da8dSAndroid Build Coastguard Worker for i in range(doctype.entities.length): 669*cda5da8dSAndroid Build Coastguard Worker entity = doctype.entities.item(i) 670*cda5da8dSAndroid Build Coastguard Worker if s: 671*cda5da8dSAndroid Build Coastguard Worker s = s + "\n " 672*cda5da8dSAndroid Build Coastguard Worker s = "%s<!ENTITY %s" % (s, entity.nodeName) 673*cda5da8dSAndroid Build Coastguard Worker if entity.publicId: 674*cda5da8dSAndroid Build Coastguard Worker s = '%s PUBLIC "%s"\n "%s"' \ 675*cda5da8dSAndroid Build Coastguard Worker % (s, entity.publicId, entity.systemId) 676*cda5da8dSAndroid Build Coastguard Worker elif entity.systemId: 677*cda5da8dSAndroid Build Coastguard Worker s = '%s SYSTEM "%s"' % (s, entity.systemId) 678*cda5da8dSAndroid Build Coastguard Worker else: 679*cda5da8dSAndroid Build Coastguard Worker s = '%s "%s"' % (s, entity.firstChild.data) 680*cda5da8dSAndroid Build Coastguard Worker if entity.notationName: 681*cda5da8dSAndroid Build Coastguard Worker s = "%s NOTATION %s" % (s, entity.notationName) 682*cda5da8dSAndroid Build Coastguard Worker s = s + ">" 683*cda5da8dSAndroid Build Coastguard Worker return s 684*cda5da8dSAndroid Build Coastguard Worker 685*cda5da8dSAndroid Build Coastguard Worker def _getNSattrs(self): 686*cda5da8dSAndroid Build Coastguard Worker return "" 687*cda5da8dSAndroid Build Coastguard Worker 688*cda5da8dSAndroid Build Coastguard Worker def external_entity_ref_handler(self, context, base, systemId, publicId): 689*cda5da8dSAndroid Build Coastguard Worker if systemId == _FRAGMENT_BUILDER_INTERNAL_SYSTEM_ID: 690*cda5da8dSAndroid Build Coastguard Worker # this entref is the one that we made to put the subtree 691*cda5da8dSAndroid Build Coastguard Worker # in; all of our given input is parsed in here. 692*cda5da8dSAndroid Build Coastguard Worker old_document = self.document 693*cda5da8dSAndroid Build Coastguard Worker old_cur_node = self.curNode 694*cda5da8dSAndroid Build Coastguard Worker parser = self._parser.ExternalEntityParserCreate(context) 695*cda5da8dSAndroid Build Coastguard Worker # put the real document back, parse into the fragment to return 696*cda5da8dSAndroid Build Coastguard Worker self.document = self.originalDocument 697*cda5da8dSAndroid Build Coastguard Worker self.fragment = self.document.createDocumentFragment() 698*cda5da8dSAndroid Build Coastguard Worker self.curNode = self.fragment 699*cda5da8dSAndroid Build Coastguard Worker try: 700*cda5da8dSAndroid Build Coastguard Worker parser.Parse(self._source, True) 701*cda5da8dSAndroid Build Coastguard Worker finally: 702*cda5da8dSAndroid Build Coastguard Worker self.curNode = old_cur_node 703*cda5da8dSAndroid Build Coastguard Worker self.document = old_document 704*cda5da8dSAndroid Build Coastguard Worker self._source = None 705*cda5da8dSAndroid Build Coastguard Worker return -1 706*cda5da8dSAndroid Build Coastguard Worker else: 707*cda5da8dSAndroid Build Coastguard Worker return ExpatBuilder.external_entity_ref_handler( 708*cda5da8dSAndroid Build Coastguard Worker self, context, base, systemId, publicId) 709*cda5da8dSAndroid Build Coastguard Worker 710*cda5da8dSAndroid Build Coastguard Worker 711*cda5da8dSAndroid Build Coastguard Workerclass Namespaces: 712*cda5da8dSAndroid Build Coastguard Worker """Mix-in class for builders; adds support for namespaces.""" 713*cda5da8dSAndroid Build Coastguard Worker 714*cda5da8dSAndroid Build Coastguard Worker def _initNamespaces(self): 715*cda5da8dSAndroid Build Coastguard Worker # list of (prefix, uri) ns declarations. Namespace attrs are 716*cda5da8dSAndroid Build Coastguard Worker # constructed from this and added to the element's attrs. 717*cda5da8dSAndroid Build Coastguard Worker self._ns_ordered_prefixes = [] 718*cda5da8dSAndroid Build Coastguard Worker 719*cda5da8dSAndroid Build Coastguard Worker def createParser(self): 720*cda5da8dSAndroid Build Coastguard Worker """Create a new namespace-handling parser.""" 721*cda5da8dSAndroid Build Coastguard Worker parser = expat.ParserCreate(namespace_separator=" ") 722*cda5da8dSAndroid Build Coastguard Worker parser.namespace_prefixes = True 723*cda5da8dSAndroid Build Coastguard Worker return parser 724*cda5da8dSAndroid Build Coastguard Worker 725*cda5da8dSAndroid Build Coastguard Worker def install(self, parser): 726*cda5da8dSAndroid Build Coastguard Worker """Insert the namespace-handlers onto the parser.""" 727*cda5da8dSAndroid Build Coastguard Worker ExpatBuilder.install(self, parser) 728*cda5da8dSAndroid Build Coastguard Worker if self._options.namespace_declarations: 729*cda5da8dSAndroid Build Coastguard Worker parser.StartNamespaceDeclHandler = ( 730*cda5da8dSAndroid Build Coastguard Worker self.start_namespace_decl_handler) 731*cda5da8dSAndroid Build Coastguard Worker 732*cda5da8dSAndroid Build Coastguard Worker def start_namespace_decl_handler(self, prefix, uri): 733*cda5da8dSAndroid Build Coastguard Worker """Push this namespace declaration on our storage.""" 734*cda5da8dSAndroid Build Coastguard Worker self._ns_ordered_prefixes.append((prefix, uri)) 735*cda5da8dSAndroid Build Coastguard Worker 736*cda5da8dSAndroid Build Coastguard Worker def start_element_handler(self, name, attributes): 737*cda5da8dSAndroid Build Coastguard Worker if ' ' in name: 738*cda5da8dSAndroid Build Coastguard Worker uri, localname, prefix, qname = _parse_ns_name(self, name) 739*cda5da8dSAndroid Build Coastguard Worker else: 740*cda5da8dSAndroid Build Coastguard Worker uri = EMPTY_NAMESPACE 741*cda5da8dSAndroid Build Coastguard Worker qname = name 742*cda5da8dSAndroid Build Coastguard Worker localname = None 743*cda5da8dSAndroid Build Coastguard Worker prefix = EMPTY_PREFIX 744*cda5da8dSAndroid Build Coastguard Worker node = minidom.Element(qname, uri, prefix, localname) 745*cda5da8dSAndroid Build Coastguard Worker node.ownerDocument = self.document 746*cda5da8dSAndroid Build Coastguard Worker _append_child(self.curNode, node) 747*cda5da8dSAndroid Build Coastguard Worker self.curNode = node 748*cda5da8dSAndroid Build Coastguard Worker 749*cda5da8dSAndroid Build Coastguard Worker if self._ns_ordered_prefixes: 750*cda5da8dSAndroid Build Coastguard Worker for prefix, uri in self._ns_ordered_prefixes: 751*cda5da8dSAndroid Build Coastguard Worker if prefix: 752*cda5da8dSAndroid Build Coastguard Worker a = minidom.Attr(_intern(self, 'xmlns:' + prefix), 753*cda5da8dSAndroid Build Coastguard Worker XMLNS_NAMESPACE, prefix, "xmlns") 754*cda5da8dSAndroid Build Coastguard Worker else: 755*cda5da8dSAndroid Build Coastguard Worker a = minidom.Attr("xmlns", XMLNS_NAMESPACE, 756*cda5da8dSAndroid Build Coastguard Worker "xmlns", EMPTY_PREFIX) 757*cda5da8dSAndroid Build Coastguard Worker a.value = uri 758*cda5da8dSAndroid Build Coastguard Worker a.ownerDocument = self.document 759*cda5da8dSAndroid Build Coastguard Worker _set_attribute_node(node, a) 760*cda5da8dSAndroid Build Coastguard Worker del self._ns_ordered_prefixes[:] 761*cda5da8dSAndroid Build Coastguard Worker 762*cda5da8dSAndroid Build Coastguard Worker if attributes: 763*cda5da8dSAndroid Build Coastguard Worker node._ensure_attributes() 764*cda5da8dSAndroid Build Coastguard Worker _attrs = node._attrs 765*cda5da8dSAndroid Build Coastguard Worker _attrsNS = node._attrsNS 766*cda5da8dSAndroid Build Coastguard Worker for i in range(0, len(attributes), 2): 767*cda5da8dSAndroid Build Coastguard Worker aname = attributes[i] 768*cda5da8dSAndroid Build Coastguard Worker value = attributes[i+1] 769*cda5da8dSAndroid Build Coastguard Worker if ' ' in aname: 770*cda5da8dSAndroid Build Coastguard Worker uri, localname, prefix, qname = _parse_ns_name(self, aname) 771*cda5da8dSAndroid Build Coastguard Worker a = minidom.Attr(qname, uri, localname, prefix) 772*cda5da8dSAndroid Build Coastguard Worker _attrs[qname] = a 773*cda5da8dSAndroid Build Coastguard Worker _attrsNS[(uri, localname)] = a 774*cda5da8dSAndroid Build Coastguard Worker else: 775*cda5da8dSAndroid Build Coastguard Worker a = minidom.Attr(aname, EMPTY_NAMESPACE, 776*cda5da8dSAndroid Build Coastguard Worker aname, EMPTY_PREFIX) 777*cda5da8dSAndroid Build Coastguard Worker _attrs[aname] = a 778*cda5da8dSAndroid Build Coastguard Worker _attrsNS[(EMPTY_NAMESPACE, aname)] = a 779*cda5da8dSAndroid Build Coastguard Worker a.ownerDocument = self.document 780*cda5da8dSAndroid Build Coastguard Worker a.value = value 781*cda5da8dSAndroid Build Coastguard Worker a.ownerElement = node 782*cda5da8dSAndroid Build Coastguard Worker 783*cda5da8dSAndroid Build Coastguard Worker if __debug__: 784*cda5da8dSAndroid Build Coastguard Worker # This only adds some asserts to the original 785*cda5da8dSAndroid Build Coastguard Worker # end_element_handler(), so we only define this when -O is not 786*cda5da8dSAndroid Build Coastguard Worker # used. If changing one, be sure to check the other to see if 787*cda5da8dSAndroid Build Coastguard Worker # it needs to be changed as well. 788*cda5da8dSAndroid Build Coastguard Worker # 789*cda5da8dSAndroid Build Coastguard Worker def end_element_handler(self, name): 790*cda5da8dSAndroid Build Coastguard Worker curNode = self.curNode 791*cda5da8dSAndroid Build Coastguard Worker if ' ' in name: 792*cda5da8dSAndroid Build Coastguard Worker uri, localname, prefix, qname = _parse_ns_name(self, name) 793*cda5da8dSAndroid Build Coastguard Worker assert (curNode.namespaceURI == uri 794*cda5da8dSAndroid Build Coastguard Worker and curNode.localName == localname 795*cda5da8dSAndroid Build Coastguard Worker and curNode.prefix == prefix), \ 796*cda5da8dSAndroid Build Coastguard Worker "element stack messed up! (namespace)" 797*cda5da8dSAndroid Build Coastguard Worker else: 798*cda5da8dSAndroid Build Coastguard Worker assert curNode.nodeName == name, \ 799*cda5da8dSAndroid Build Coastguard Worker "element stack messed up - bad nodeName" 800*cda5da8dSAndroid Build Coastguard Worker assert curNode.namespaceURI == EMPTY_NAMESPACE, \ 801*cda5da8dSAndroid Build Coastguard Worker "element stack messed up - bad namespaceURI" 802*cda5da8dSAndroid Build Coastguard Worker self.curNode = curNode.parentNode 803*cda5da8dSAndroid Build Coastguard Worker self._finish_end_element(curNode) 804*cda5da8dSAndroid Build Coastguard Worker 805*cda5da8dSAndroid Build Coastguard Worker 806*cda5da8dSAndroid Build Coastguard Workerclass ExpatBuilderNS(Namespaces, ExpatBuilder): 807*cda5da8dSAndroid Build Coastguard Worker """Document builder that supports namespaces.""" 808*cda5da8dSAndroid Build Coastguard Worker 809*cda5da8dSAndroid Build Coastguard Worker def reset(self): 810*cda5da8dSAndroid Build Coastguard Worker ExpatBuilder.reset(self) 811*cda5da8dSAndroid Build Coastguard Worker self._initNamespaces() 812*cda5da8dSAndroid Build Coastguard Worker 813*cda5da8dSAndroid Build Coastguard Worker 814*cda5da8dSAndroid Build Coastguard Workerclass FragmentBuilderNS(Namespaces, FragmentBuilder): 815*cda5da8dSAndroid Build Coastguard Worker """Fragment builder that supports namespaces.""" 816*cda5da8dSAndroid Build Coastguard Worker 817*cda5da8dSAndroid Build Coastguard Worker def reset(self): 818*cda5da8dSAndroid Build Coastguard Worker FragmentBuilder.reset(self) 819*cda5da8dSAndroid Build Coastguard Worker self._initNamespaces() 820*cda5da8dSAndroid Build Coastguard Worker 821*cda5da8dSAndroid Build Coastguard Worker def _getNSattrs(self): 822*cda5da8dSAndroid Build Coastguard Worker """Return string of namespace attributes from this element and 823*cda5da8dSAndroid Build Coastguard Worker ancestors.""" 824*cda5da8dSAndroid Build Coastguard Worker # XXX This needs to be re-written to walk the ancestors of the 825*cda5da8dSAndroid Build Coastguard Worker # context to build up the namespace information from 826*cda5da8dSAndroid Build Coastguard Worker # declarations, elements, and attributes found in context. 827*cda5da8dSAndroid Build Coastguard Worker # Otherwise we have to store a bunch more data on the DOM 828*cda5da8dSAndroid Build Coastguard Worker # (though that *might* be more reliable -- not clear). 829*cda5da8dSAndroid Build Coastguard Worker attrs = "" 830*cda5da8dSAndroid Build Coastguard Worker context = self.context 831*cda5da8dSAndroid Build Coastguard Worker L = [] 832*cda5da8dSAndroid Build Coastguard Worker while context: 833*cda5da8dSAndroid Build Coastguard Worker if hasattr(context, '_ns_prefix_uri'): 834*cda5da8dSAndroid Build Coastguard Worker for prefix, uri in context._ns_prefix_uri.items(): 835*cda5da8dSAndroid Build Coastguard Worker # add every new NS decl from context to L and attrs string 836*cda5da8dSAndroid Build Coastguard Worker if prefix in L: 837*cda5da8dSAndroid Build Coastguard Worker continue 838*cda5da8dSAndroid Build Coastguard Worker L.append(prefix) 839*cda5da8dSAndroid Build Coastguard Worker if prefix: 840*cda5da8dSAndroid Build Coastguard Worker declname = "xmlns:" + prefix 841*cda5da8dSAndroid Build Coastguard Worker else: 842*cda5da8dSAndroid Build Coastguard Worker declname = "xmlns" 843*cda5da8dSAndroid Build Coastguard Worker if attrs: 844*cda5da8dSAndroid Build Coastguard Worker attrs = "%s\n %s='%s'" % (attrs, declname, uri) 845*cda5da8dSAndroid Build Coastguard Worker else: 846*cda5da8dSAndroid Build Coastguard Worker attrs = " %s='%s'" % (declname, uri) 847*cda5da8dSAndroid Build Coastguard Worker context = context.parentNode 848*cda5da8dSAndroid Build Coastguard Worker return attrs 849*cda5da8dSAndroid Build Coastguard Worker 850*cda5da8dSAndroid Build Coastguard Worker 851*cda5da8dSAndroid Build Coastguard Workerclass ParseEscape(Exception): 852*cda5da8dSAndroid Build Coastguard Worker """Exception raised to short-circuit parsing in InternalSubsetExtractor.""" 853*cda5da8dSAndroid Build Coastguard Worker pass 854*cda5da8dSAndroid Build Coastguard Worker 855*cda5da8dSAndroid Build Coastguard Workerclass InternalSubsetExtractor(ExpatBuilder): 856*cda5da8dSAndroid Build Coastguard Worker """XML processor which can rip out the internal document type subset.""" 857*cda5da8dSAndroid Build Coastguard Worker 858*cda5da8dSAndroid Build Coastguard Worker subset = None 859*cda5da8dSAndroid Build Coastguard Worker 860*cda5da8dSAndroid Build Coastguard Worker def getSubset(self): 861*cda5da8dSAndroid Build Coastguard Worker """Return the internal subset as a string.""" 862*cda5da8dSAndroid Build Coastguard Worker return self.subset 863*cda5da8dSAndroid Build Coastguard Worker 864*cda5da8dSAndroid Build Coastguard Worker def parseFile(self, file): 865*cda5da8dSAndroid Build Coastguard Worker try: 866*cda5da8dSAndroid Build Coastguard Worker ExpatBuilder.parseFile(self, file) 867*cda5da8dSAndroid Build Coastguard Worker except ParseEscape: 868*cda5da8dSAndroid Build Coastguard Worker pass 869*cda5da8dSAndroid Build Coastguard Worker 870*cda5da8dSAndroid Build Coastguard Worker def parseString(self, string): 871*cda5da8dSAndroid Build Coastguard Worker try: 872*cda5da8dSAndroid Build Coastguard Worker ExpatBuilder.parseString(self, string) 873*cda5da8dSAndroid Build Coastguard Worker except ParseEscape: 874*cda5da8dSAndroid Build Coastguard Worker pass 875*cda5da8dSAndroid Build Coastguard Worker 876*cda5da8dSAndroid Build Coastguard Worker def install(self, parser): 877*cda5da8dSAndroid Build Coastguard Worker parser.StartDoctypeDeclHandler = self.start_doctype_decl_handler 878*cda5da8dSAndroid Build Coastguard Worker parser.StartElementHandler = self.start_element_handler 879*cda5da8dSAndroid Build Coastguard Worker 880*cda5da8dSAndroid Build Coastguard Worker def start_doctype_decl_handler(self, name, publicId, systemId, 881*cda5da8dSAndroid Build Coastguard Worker has_internal_subset): 882*cda5da8dSAndroid Build Coastguard Worker if has_internal_subset: 883*cda5da8dSAndroid Build Coastguard Worker parser = self.getParser() 884*cda5da8dSAndroid Build Coastguard Worker self.subset = [] 885*cda5da8dSAndroid Build Coastguard Worker parser.DefaultHandler = self.subset.append 886*cda5da8dSAndroid Build Coastguard Worker parser.EndDoctypeDeclHandler = self.end_doctype_decl_handler 887*cda5da8dSAndroid Build Coastguard Worker else: 888*cda5da8dSAndroid Build Coastguard Worker raise ParseEscape() 889*cda5da8dSAndroid Build Coastguard Worker 890*cda5da8dSAndroid Build Coastguard Worker def end_doctype_decl_handler(self): 891*cda5da8dSAndroid Build Coastguard Worker s = ''.join(self.subset).replace('\r\n', '\n').replace('\r', '\n') 892*cda5da8dSAndroid Build Coastguard Worker self.subset = s 893*cda5da8dSAndroid Build Coastguard Worker raise ParseEscape() 894*cda5da8dSAndroid Build Coastguard Worker 895*cda5da8dSAndroid Build Coastguard Worker def start_element_handler(self, name, attrs): 896*cda5da8dSAndroid Build Coastguard Worker raise ParseEscape() 897*cda5da8dSAndroid Build Coastguard Worker 898*cda5da8dSAndroid Build Coastguard Worker 899*cda5da8dSAndroid Build Coastguard Workerdef parse(file, namespaces=True): 900*cda5da8dSAndroid Build Coastguard Worker """Parse a document, returning the resulting Document node. 901*cda5da8dSAndroid Build Coastguard Worker 902*cda5da8dSAndroid Build Coastguard Worker 'file' may be either a file name or an open file object. 903*cda5da8dSAndroid Build Coastguard Worker """ 904*cda5da8dSAndroid Build Coastguard Worker if namespaces: 905*cda5da8dSAndroid Build Coastguard Worker builder = ExpatBuilderNS() 906*cda5da8dSAndroid Build Coastguard Worker else: 907*cda5da8dSAndroid Build Coastguard Worker builder = ExpatBuilder() 908*cda5da8dSAndroid Build Coastguard Worker 909*cda5da8dSAndroid Build Coastguard Worker if isinstance(file, str): 910*cda5da8dSAndroid Build Coastguard Worker with open(file, 'rb') as fp: 911*cda5da8dSAndroid Build Coastguard Worker result = builder.parseFile(fp) 912*cda5da8dSAndroid Build Coastguard Worker else: 913*cda5da8dSAndroid Build Coastguard Worker result = builder.parseFile(file) 914*cda5da8dSAndroid Build Coastguard Worker return result 915*cda5da8dSAndroid Build Coastguard Worker 916*cda5da8dSAndroid Build Coastguard Worker 917*cda5da8dSAndroid Build Coastguard Workerdef parseString(string, namespaces=True): 918*cda5da8dSAndroid Build Coastguard Worker """Parse a document from a string, returning the resulting 919*cda5da8dSAndroid Build Coastguard Worker Document node. 920*cda5da8dSAndroid Build Coastguard Worker """ 921*cda5da8dSAndroid Build Coastguard Worker if namespaces: 922*cda5da8dSAndroid Build Coastguard Worker builder = ExpatBuilderNS() 923*cda5da8dSAndroid Build Coastguard Worker else: 924*cda5da8dSAndroid Build Coastguard Worker builder = ExpatBuilder() 925*cda5da8dSAndroid Build Coastguard Worker return builder.parseString(string) 926*cda5da8dSAndroid Build Coastguard Worker 927*cda5da8dSAndroid Build Coastguard Worker 928*cda5da8dSAndroid Build Coastguard Workerdef parseFragment(file, context, namespaces=True): 929*cda5da8dSAndroid Build Coastguard Worker """Parse a fragment of a document, given the context from which it 930*cda5da8dSAndroid Build Coastguard Worker was originally extracted. context should be the parent of the 931*cda5da8dSAndroid Build Coastguard Worker node(s) which are in the fragment. 932*cda5da8dSAndroid Build Coastguard Worker 933*cda5da8dSAndroid Build Coastguard Worker 'file' may be either a file name or an open file object. 934*cda5da8dSAndroid Build Coastguard Worker """ 935*cda5da8dSAndroid Build Coastguard Worker if namespaces: 936*cda5da8dSAndroid Build Coastguard Worker builder = FragmentBuilderNS(context) 937*cda5da8dSAndroid Build Coastguard Worker else: 938*cda5da8dSAndroid Build Coastguard Worker builder = FragmentBuilder(context) 939*cda5da8dSAndroid Build Coastguard Worker 940*cda5da8dSAndroid Build Coastguard Worker if isinstance(file, str): 941*cda5da8dSAndroid Build Coastguard Worker with open(file, 'rb') as fp: 942*cda5da8dSAndroid Build Coastguard Worker result = builder.parseFile(fp) 943*cda5da8dSAndroid Build Coastguard Worker else: 944*cda5da8dSAndroid Build Coastguard Worker result = builder.parseFile(file) 945*cda5da8dSAndroid Build Coastguard Worker return result 946*cda5da8dSAndroid Build Coastguard Worker 947*cda5da8dSAndroid Build Coastguard Worker 948*cda5da8dSAndroid Build Coastguard Workerdef parseFragmentString(string, context, namespaces=True): 949*cda5da8dSAndroid Build Coastguard Worker """Parse a fragment of a document from a string, given the context 950*cda5da8dSAndroid Build Coastguard Worker from which it was originally extracted. context should be the 951*cda5da8dSAndroid Build Coastguard Worker parent of the node(s) which are in the fragment. 952*cda5da8dSAndroid Build Coastguard Worker """ 953*cda5da8dSAndroid Build Coastguard Worker if namespaces: 954*cda5da8dSAndroid Build Coastguard Worker builder = FragmentBuilderNS(context) 955*cda5da8dSAndroid Build Coastguard Worker else: 956*cda5da8dSAndroid Build Coastguard Worker builder = FragmentBuilder(context) 957*cda5da8dSAndroid Build Coastguard Worker return builder.parseString(string) 958*cda5da8dSAndroid Build Coastguard Worker 959*cda5da8dSAndroid Build Coastguard Worker 960*cda5da8dSAndroid Build Coastguard Workerdef makeBuilder(options): 961*cda5da8dSAndroid Build Coastguard Worker """Create a builder based on an Options object.""" 962*cda5da8dSAndroid Build Coastguard Worker if options.namespaces: 963*cda5da8dSAndroid Build Coastguard Worker return ExpatBuilderNS(options) 964*cda5da8dSAndroid Build Coastguard Worker else: 965*cda5da8dSAndroid Build Coastguard Worker return ExpatBuilder(options) 966