Monday, 15 August 2011

redirect - how to follow meta refreshes in Python -


The urllib2 of Python follows 3xx redirects to obtain the final content. Is there a way to follow urllib2 (or some other library) too? Or need to parse the HTML manually for the latest meta tag?

Here is a solution using Sundersup and CirclePlide 2 (and certified based authentication):

  import beautiful soup import themeplib2 def meta_redirect (content): soup = beautiful soup. Beautiful soup (content) result = soup.find ("meta", attrs = {"Http-equiv": "refresh"}) If the result: wait, text = result ["content"] Partition (";") if text. Column (). Less (). Beginwith ("url ="): URL = Text [4:] Return URL Returns Any DRAP get_content (URL, key, certificate): HK = hippolib 2 H.P. (".cache") H.Ad_Cr Certificates (Key, STR, "") REP, Content = H. (Url, "GET") # Following meta_redirect (content): resp, content = h.request (meta_redirect ("content", "GET"), following the series of redirects    

No comments:

Post a Comment