Monday, 15 March 2010

lxml - How to pretty print an xml file in Python? -


I want to clean a complex xml file, using lxml. The problem is that there are many elements in it whose tail is. For example, there is an xml like this:

  & lt; Body & gt; & Lt; Share & gt; N & lt; / Section & gt; & Lt; / Body & gt;   

I have to arrange this in:

  & lt; Body & gt; & Lt; Share & gt; N & lt; / Section & gt; & Lt; / Body & gt;   

I tried to apply pretty_print with the remove_blank_text parser in lxml for the first time but it failed

  at xml_doc = '& lt Import lxml.etree ;. Body & gt; & Lt; Share & gt; N & lt; / Section & gt; & Lt; / Body & gt; Parser = ET.XMLParser (remove_blank_text = true) root = ET.fromstring (xml_doc, parser) print (ET.tostring (root, pretty_print = true)) & Gt; & Gt; B '& lt; Body & gt; & Lt; Share & gt; N & lt; / Section & gt; & Lt; / Body & gt; \ N '  

And then, I tried again without applying the parser without any profit.

  import as lxml.etree Xml_doc = '& lt; Body & gt; & Lt; Part & gt; N & lt; / Part & gt; & Lt; / Body & gt; Root = et.frost string (xml_doc) print (ET.tostring (root, pretty_print = true) gt; & Gt; & Gt; B '& lt; Body & gt; & Lt; Part & gt; N & lt; / Part & gt; If the pretty_print feature does not help, then & lt; / Body & gt; \ N '   

You probably write your own recursive method to print a beautiful Some on the lines of

  def pprint (root, indentTabs = 0): print "from <% s% s>" % (IndentTabs * "\ t", root.tag) print (indentTabs +1) * "root" for root.value element in root.children (): pprint (element, indentTabs +1) print "& lt ; / S% s & gt; "% (IndentTabs *" \ t ", root.tag)   

Although some pre-available options may have the above method just take care of tags .

Edit: The format will be printed in the format

 & lt; Tag & gt; If you have to add code to take care of xml attributes Text & lt; / Tags & gt;  

You can modify it according to your requirement.

No comments:

Post a Comment