Saturday, 15 September 2012

Parsing large combined XML document with Python -


I have a large document (400 mb), which has hundreds of XML documents, each of which has its own announcements I'm trying to parse each document using ElementTree in Python. I'm having a big problem dividing each XML document to parse the information here. What example of the document is this:

    

Ideally I would like to read through every XML declaration, parsing data and continue with the next XML document. Any suggestions will help

You will need to read documents separately; Here is a generator function that will present the complete XML document from a given file object:

  def xml_documents (fileobj): document = [] for the line in fileobj: if line.strip () ('& Lt ;? Xml') and document: produce '' .joy (document) document = [] document. Append (line) if the document: yield '' .join (document)   < P> then open with file ('file_with_multiple_xmldocuments'):  
  Use  ElementTree.fromstring ()  to load and parse: xml_documents (Fileobj) To switch to xml: tree = Elimenttry. Froststring (xml)    

No comments:

Post a Comment