Monday 15 September 2014

python - Parsing html forms input tag with Beautiful Soup -


Well, I need to parse the html form with "input", let me remove those people with "text"

I have this code: Import beautiful suits from beautiful soup import html_data = open ("forms.html") def html_parser (html_data) Html_proc = Beatsop (html_data) # We can extract the text input. Txtinput = html_proc.findAll ('input', {'type': 'text'}) We remove any kind of input which is not text. Listform = ["Re Otimput = html_proc.findAll ('input', {'type': listform}) html_parser (html_data) / Pre>

I use it with local documents, but you can use urlib to do with the form of any web page.Now, problem, I want to say "value" of non-text input forms Need to remove the tag and remove the "name" tag with the text. Does anyone know how I can do this?

Thank you!

To use the attribute, use the element of the element ['attribute'] . BeautifulSoup import from BeautifulSoup beatsop def html_parser (html_data): html_proc = beatsop (html_data) # We remove text input Txtinput = html_proc.findAll ('input', {'type': 'text'}) listform = ["radio", " Print ('non-text input value:') print ('' checkbox "," password "," file "," image "," hidden "] otrimput = html_proc Txtinput: print (elem ['name']) Input ',' type ': list form}} print (' text input name: ') in otrimput elem: value = elem.get (' value ') If value: print (value) and print (' {} near There is no value. 'Format (AMM) open with ("forms.html") html_data: html_parser (Html_d Ata)

No comments:

Post a Comment