Tuesday 15 June 2010

python - Scrapy: Return only first XPATH match (one result per page) -


I am scraping multiple xpaths / variables from a webpage and only one of these xpaths has multiple displays per page ( However, I need this to match my results to line-per-line).

Before moving on to the next start_url, do I have to remove only the first example of my path selector on each page? Thanks! DEF Pars (self, response): HXS = HTMaxPath selector (response) item = [] item = FLEECH NAIAR ITEM () item ["age"] = H.x.ct (' // Li [included (class @, "our age")] / span / text () '). Then (r '\ n \ s * (. *) \ N') or [']] item ["product"] = Hxs.select (' // div) '([ID,' price-review-age '] ] / H1 / lesson () '). Remove () item ["value"] = hx Selection ('// DD [included (class, "our")] / text ()'). Re ('r' \ n \ s * (. *) \ N ') item ["availability"] = hxs.select (' // div '((square, "in-stock")] / text () ). Remove item (item)

I'm not completely sure That's what you say. You can select the first match like this: "/ path / element [1]"

Or maybe you want it :

"All of the following for: elementlooking [predecessor :: initialization [[id = 'start_arl'] [1]]"

The ID attribute will be removed from element "start_url" .

No comments:

Post a Comment