Tuesday 15 February 2011

python - Scrapy: How to make a conditional (present or absent) XPATH return values when absent? -


I am trying to scrape the information of a particular product from a website. One of my desired XPATH criteria, however, does not appear on every product page. (While all products have names, prices, etc., some recommended age is not shown).

This is not a problem, however, when Scrappy writes or even gives data to a shell, then it is no longer associated with the list of start-url, nor does it allow data from some URL Respects absence Therefore, all of my data (multiple columns of different variables) do not match the new age column because it is too short and out of order. This is not the case when I focus on the products that display the age.

Is there a way to create a page without the desired XPATH, and the age returns an empty space to maintain the matching column order in my data?

Here is my XPATH selector:

  items ["age"] = hxs.select ('// li [contains (@class,' our-age ') ] / Span / text () '). Remove ()   

(Some webpages do not have age and there is a complete lack of path in this way.)

  xpath = '// li [contains (@class," our-age ")] / span / text ()' items [" age "] = hxs Select (xpath) .extract () or [']]    

No comments:

Post a Comment