I'm getting a string that has been cleaned with LXL cleaner, so all the links are now in the form content. I would like to leave all those links which do not have any href attribute, e.g.
& lt; A rel = "nofollow" & gt; Removal links & lt; / A & gt; should be
link to delete for the same:
& lt; A & gt; Another link to remove & lt; / A & gt; Be the shield:
Other links to be removed All links with just the missing href attribute. It is not regex, but since LXML gives a clean markup structure, it should be possible. What I need is a string that snaps this type of non-functional tag.
Use Drop_tag method. import lxml.html root = lxml.html .fromstring (' test & lt; a rel = "nofollow"> be linking & lt; ( One for 'A [not (@heref)]' one: (A & gt; deleted & lt; / a & gt; and & lt; a href = "# "Gt; link & lt; b & gt; removed & quot; link & lt; Lt; / b & gt; & lt; a href = "#" & gt; Link & lt; / a & gt; & lt; / div & gt; '
.drop_tag (): Leaves the tag, but keeps its children and the text.
No comments:
Post a Comment