I am trying to check and check webpages whether the website owner allows them to contact or not. .
This is the job
It calls every thread:
def getpage (): try: curl = urls.pop (0 ) Is working on "print" + Str (curl) thepage1 = requests.get (curl) .text global ctot if "contact us" inpage1: slist.write ("\ n" + curl) ctot = ctot + 1 Except: End at the end: if the LAN (URL) & Gt; 0: getpage () But this thing improves the memory of the program .. (pythonw.exe)
again as the task of calling the thread The condition is true .. The program's memory should be at least one level.
For a 100k URL containing list, the program is taking more than 3 GB and is increasing ...
Your program is recursive for no reason. Recycling means that for each page you have to create a new set of variables, and because they are still being referenced by the local variable in the function because the function never ends, the garbage collection is never in the play And it will continue
while len (urls)> 0: 0: 0: 00: 00.800,0: 00: 00.20 and <0> < / P>
Try: curl = urls.pop (0) thepage1 = requests.get (curl) .text global ctot if "contact us" inpage1: slist.write ("\ n" + curl) cto Excluding t = ctot + 1: pass
No comments:
Post a Comment