Tuesday, 15 March 2011

Screen scraping with Selenium in Python: links constructed by Javascript -


I am creating a webcroller using selenium and python and I have participated in a little trouble. Crawler

ListlinkerHref = self.browser.find_elements_by_xpath ("// * [@ href]")

And running again on Listlinker this one Although href works well for classic links with attribute, between lines 110 and 135 (approx), a quick glance at the source code for the homepage of www.primitiveworldproductions.com, using Javascript with no href attributes in sight Shows a bunch of links

I know there is nothing about javascript and I have seen about selenium docs, but I can not find any way to find those links. Is there a strong, omnipresent way of finding all links in the source code, in which Java features, without ARAC attributes have been created? Note that my crawler does not work by clicking on the link (it just adds them to the list that is open later) and the crawler needs to be able to crawl any site, without any special reason. Is this possible?

Edit:

Here are some lines before the part of the source code in question.

  var n111 = menuMgr .createMenu ("ref111"); N111.addItem ("126", "Staff bios", "/staff.aspx", ["system / nlsmenu / img / submenuovr.gif", "system / nlsmenu / img / submenuovr.gif"], correct, empty, "Ref126"); Var n112 = menuMgr.createMenu ("ref112"); N112.addItem ("146", "Promotional Video", "/ Promotional Video. Espacks", ["System / NLSNU / IMG / Submenuoever gif", "System / NLSNU / IMG / SubmenuOver.GIF"], Correct, "Ref146"); N112.addItem ("120", "Video for social media", "/ vsm.aspx", ["system / nlsmenu / img / submenuovr.gif", "system / nlsmenu / img / submenuovr.gif"], true Is empty, "ref120"); N112.addItem ("147", "Live Webcasting and Event Video", "/ Webcasting. Aspx", ["System / NlsMenu / IMG / SubmenuOver.Gif", "System / NlsMenu / IMG / SubmenuOver.GIF"], True , Blank, "ref147");    

If you click the right mouse button on any item in the menu and then Select "Inspect Elements", you will see the HTML code generated with JavaScript. You will notice that there is no href attribute on the menu item at primitiveworldproductions.com and the target of the link is loaded on the onclick event. I'm afraid there is no easy way to remove links from this menu.

No comments:

Post a Comment