I am using Python and beautiful soup to extract some text from html. I have some html in which the text of the form is
& lt; H3 & gt; & Lt; B & gt; ABC & lt; / B & gt; & Lt; B & gt; DEF & lt; / B & gt; & Lt; / H3 & gt; I want to remove the repeated b tag. Is there a quick way to do this?
It works just fine for BS4
[4]: soup.h3 out [4]: & lt; H3 & gt; & Lt; B & gt; ABC & lt; / B & gt; & Lt; B & gt; DEF & lt; / B & gt; & Lt; / H3 & gt; In [5]: soup.h3.text out [5]: U 'ABC DEF' See document and package here:
No comments:
Post a Comment