I have a large document (400 mb), which has hundreds of XML documents, each of which has its own announcements I'm trying to parse each document using ElementTree in Python. I'm having a big problem dividing each XML document to parse the information here. What example of the document is this:
Ideally I would like to read through every XML declaration, parsing data and continue with the next XML document. Any suggestions will help
You will need to read documents separately; Here is a generator function that will present the complete XML document from a given file object:
def xml_documents (fileobj): document = [] for the line in fileobj: if line.strip () ('& Lt ;? Xml') and document: produce '' .joy (document) document = [] document. Append (line) if the document: yield '' .join (document) < P> then open with file ('file_with_multiple_xmldocuments'): Use ElementTree.fromstring () to load and parse: xml_documents (Fileobj) To switch to xml: tree = Elimenttry. Froststring (xml)
No comments:
Post a Comment