Giuseppe: python - a (presumably basic) web scraping of http://www.ssa.gov/cgi-bin/popularnames.cgi in urllib -

Sunday, 15 July 2012

python - a (presumably basic) web scraping of http://www.ssa.gov/cgi-bin/popularnames.cgi in urllib -

I am very new to Python (and web scrapping). let me ask you a question.

Many websites do not actually report their specific URLs in Firefox or other browsers. For example, popular child names are shown with ranks (since 1880) in the Social Security Administrator, but when I The year changes from 1880 to 1881, then the URL does not change. It is consistent,

Because I do not know the specific URL, I could not download the webpage using urlib.

The source of this page includes:

input type = "text" name = "year" id = "yob" size = "4" value = " 1880 "& gt;

It seems that, if I can control this "year" value (e.g., "1881" or "1991"), I can deal with this problem. Huh? I still do not know how to do it.

Can anyone tell me the solution for this?

If you know some websites that can help in my studies, please let me know.

Thank you!

  You can still use  urllib . The button does a post to the current URL. Using Firefox I took a look at network traffic and found that they are sending 3 parameters:  member ,  top , and  year . You can send the same logic:  
  import urllib url = 'http://www.ssa.gov/cgi-bin/popularnames.cgi' post_params = {# member was empty, so I 'Top': '25', 'Year': Year} Post_Arg = urlib.urlencode (post_params)    Now, just send url-encoded logic:  
  urllib.urlopen (url, post_args)    If you also need to send the header:  
  header = {'accept' : 'Text /html,application/xhtml+xml.application/xml;q=0.9,*/*;q=0.8', 'Accept-language': 'N-US, N; Q = 0.5 ',' connection ':' keep-alive ',' host ':' www.ssa.gov ',' referrer ':' http://www.ssa.gov/cgi-bin/popularnames.cgi ' , 'User-agent': 'Mozilla / 5.0 (Windows NT 6.1; WOW64; rv: 21.0) Gecko / 20100101 Firefox / 21.0'} # POST with data: urllib.urlopen (url, post_args, headers)    execute a code loop:    year in the xrange (1880, 2014): # code above ...




Posted by



Unknown




at

03:22











Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest




No comments:







Post a Comment




Newer Post


Older Post

Home




Subscribe to:
Post Comments (Atom)


















    
About Me




Unknown



View my complete profile



Blog Archive








        ► 
      



2015

(1886)





        ► 
      



September

(203)







        ► 
      



August

(208)







        ► 
      



July

(224)







        ► 
      



June

(210)







        ► 
      



May

(230)







        ► 
      



April

(195)







        ► 
      



March

(209)







        ► 
      



February

(201)







        ► 
      



January

(206)









        ► 
      



2014

(2117)





        ► 
      



September

(239)







        ► 
      



August

(251)







        ► 
      



July

(226)







        ► 
      



June

(208)







        ► 
      



May

(229)







        ► 
      



April

(199)







        ► 
      



March

(255)







        ► 
      



February

(275)







        ► 
      



January

(235)









        ► 
      



2013

(2011)





        ► 
      



September

(199)







        ► 
      



August

(228)







        ► 
      



July

(210)







        ► 
      



June

(222)







        ► 
      



May

(217)







        ► 
      



April

(229)







        ► 
      



March

(243)







        ► 
      



February

(221)







        ► 
      



January

(242)









        ▼ 
      



2012

(1993)





        ► 
      



September

(227)







        ► 
      



August

(235)







        ▼ 
      



July

(225)

LoadRunner Referer option, what is it used for? -
java - How to save HashMap to a file and load it -
php - parse a url, get hash value, append to and r...
java - JAX-WS port from WSDL -
c - fuse: How do I get the size of the file that i...
javascript countdown synchronize on different brow...
c# - Building Visual Studio projects for 2 Framewo...
c# - No mapping exists from object type system.io....
validation - What is the right way to replace an e...
c# - Timing delays using Schedule for NServiceBus -
jquery - Many modal windows with large strings -
unexpected symbol in R please see the code -
python - Complex Query: Select latest object from ...
ember.js - why am i seeing "error while loading ro...
Kendo datepicker shows two months during animation -
Using a USB printer with C#, with and without driv...
active directory - PowerShell v2.0+ add a computer...
android - ProgressBar ontop of TableLayout -
javascript - Is it possible to create unzoom eleme...
delphi - How do I show the Windows photo-printing ...
python - Pyflakes with Emacs -
sublimetext2 - FTP using sublime text 2 for Mac -
javascript - enable postback onclick of a hyperlin...
Getting variable type in Autohotkey -
mysql - Why am I getting "#1066 - Not unique table...
java - Is there a way to execute EJBs outside an a...
android - Java not running with Eclipse, editing e...
Why does var act like cov in R? -
c# - Unable to parse CloudStorageAccount -
php - How to design browser independent html pages? -
mysql - Getting error in creating procedure -
parsing - Parse valid html (php - tidy) -
jquery - Trouble combining checkboxes with this fi...
vb.net - Callback For Tasks -
php - Refreshing Dynamic Images Using jQuery -
.net - Find the generic type -
Same body function works for different types of in...
Regex expression to validate for dollar amount -
Send parameter in php to trigger page refresh 1x -
c# - BackgroundWorker with MySql not working prope...
android - make jsoup get the full HTML content lik...
asp.net - Combining 'AND , 'OR' in Select statement -
objective c - iOS: Align UIImageView to top -
celery - Django is this chaining tasks? -
Is routes in ruby on rails part of controllers for...
Manually stepping through CSS animation with JavaS...
C# Language Specification ver 5.0 and List -
python - a (presumably basic) web scraping of
http...
Implement a LRU Cache in C++ - compilation error -
ios - Application size won't change in iCloud afte...
dojo - Error when using dojox.mobile.Opener -
multi-platform eclipse project using git branches? -
mysql - One field search for multiple values -
html - Extracting a value from webpage element wit...
Performance issues of mysql query as compared to o...
Sharepoint 2013 workflow not firing when document ...
.net - Homegrown integration system OR ESB? -
php - how to store and retrieve images in database...
firefox - Selenium: WebDriver - System.TypeLoadExc...
php - How to get links with mp3 as extension -
c# - How I refresh a HttpWebRequest object on Wind...
python - scipy ImportError on travis-ci -
visual studio 2012 - Creating WiX Setup for a desk...
javascript - How to make background image switchin...
python - Insert users into Active Directory -
c# - Is there a way to have website admins without...
ios - Do I need a ViewController container or a co...
How Can I Do Git SVN Correctly? -
sql - get count(*) from 3rd table using 2nd table ...
c - negative integer number >> 31 = -1 not 1? -
html - CSS3 Keyframe transition alternate onclick ...
javascript - unable to get value from tag in safari -
svg - D3.js How to rotate Text on a path -
c# - Convert async lambda expression to delegate t...
html - Update td block with javascript -
function - jQuery Toggle not working in IE8 -
php - Can one convert pdf to swf dynamically? -
java - Why I cannot add ArrayList directly to Jlis...
neo4j - Negating multiple relationships -
siblings - jQuery: How to use .nextUntil()? -
css3 - Menu not aligning with an image/link (CSS i...
jquery - Change submit form after two 'select' opt...
c# - "Custom Cursor cannot be converted to String"...
java - How can I add a class file into a jar file ...
java - GWT method returns before RequestBuilder aj...
virtualization - virsh list command not showing an...
wpf - how to disable a button when a textbox becom...
svn - Feature requests in the middle of iteration -
jquery - FOR IF Loop in javascript -
Connection between COM exe and a .net dll -
symfony - How can I make the link to the middle of...
c# - Indexer with default parameters -
android - BadTokenException Unable to add Window S...
php - preg_match_all matches unexpectedly -
c# - How to use one binding for multiple web servi...
zend framework 1.11.11 routing -
maven - Gradle changing dependencies by classifier -
eclipse - How to display the label of an item on t...
Is there a way to search/filter properties in Styl...
google chart Json response within 1 sec but still ...








        ► 
      



June

(206)







        ► 
      



May

(221)







        ► 
      



April

(216)







        ► 
      



March

(206)







        ► 
      



February

(227)







        ► 
      



January

(230)









        ► 
      



2011

(1964)





        ► 
      



September

(220)







        ► 
      



August

(222)







        ► 
      



July

(219)







        ► 
      



June

(224)







        ► 
      



May

(219)







        ► 
      



April

(206)







        ► 
      



March

(216)







        ► 
      



February

(221)







        ► 
      



January

(217)









        ► 
      



2010

(1952)





        ► 
      



September

(230)







        ► 
      



August

(202)







        ► 
      



July

(221)







        ► 
      



June

(207)







        ► 
      



May

(213)







        ► 
      



April

(199)







        ► 
      



March

(234)







        ► 
      



February

(244)







        ► 
      



January

(202)


















    















Powered by Blogger.