Saturday 15 January 2011

python - Remove unnecessary repeated tags with BeautifulSoup -


I am using Python and beautiful soup to extract some text from html. I have some html in which the text of the form is

  & lt; H3 & gt; & Lt; B & gt; ABC & lt; / B & gt; & Lt; B & gt; DEF & lt; / B & gt; & Lt; / H3 & gt;   

I want to remove the repeated b tag. Is there a quick way to do this?

It works just fine for BS4

  [4]: soup.h3 out [4]: ​​& lt; H3 & gt; & Lt; B & gt; ABC & lt; / B & gt; & Lt; B & gt; DEF & lt; / B & gt; & Lt; / H3 & gt; In [5]: soup.h3.text out [5]: U 'ABC DEF'   

See document and package here:

No comments:

Post a Comment